How to find similarity between two graphs in Neo4j?

Hello,
I want to import 2 separate graphs into Neo4j and then to find the similarity between them. How to do that?
I had a look at the Jaccard Similarity procedure and it seems that the 2 graphs should be connected in some way.
But what if they are not?

Also, I have found that one of the algorithms used for graph similarity/ isomorphism is Maximum Common Subgraph. Is there a way to use it in Neo4j?

Thanks in advance.

3 Likes

"Graph similarity involves determining the degree of similarity between these two graphs (a number between 0 and 1). Intuitively, since we know the node correspondences, the same node in both graphs would be similar if its neighbors are similar (and its connectivity, in terms of edge weights, to its neighbors). Again, its neighbors are similar if their neighborhoods are similar, and so on."

You need to share some elements between the two graph for example if you're talking about customers and e-costumers there are some properties that share like gender, zip, contact number. etc
maybe you could share a little more of info about the graphs and the model that you want to do.

kind regards, Roberto

1 Like

Can you explain a bit more about what you mean by similarity between 2 graphs?
Those OOTB similarity algorithms (e.g., jaccard) are meant to be used to find similarity between nodes. You might be able to use them for other purposes though.

1 Like

Thanks Roberto for the reply and for the document.
My goal is to have two separate graphs (and I know previously that there similar ) and I want to calculate the overall similarity or change that happens from one to another.
Example:


Here, I have a chemical compound 1 with nodes as (H,C, etc) and edges as bonds (Ionic, etc). which means I have heterogeneity in nodes and edges. Then I have another chemical compound 2 (assumably) almost similar, with small changes ( e.g. changing one H node to C node, same as for the edges)
My goal is to compare these two standalone graphs and to come up with a percentage of similarity.

Hope that my idea is clear.

Thanks in advance.
Ahmed.

2 Likes

Thanks shan for your reply.
By the similarity between 2 graphs: can be thoughts also as is I have two identical standalone graphs and then I tweaked some nodes/edges here and there. So, I want to measure how similar this new (tweaked) version of the graph to the original one.
I have also mentioned an example in my second reply above.

Hope this will help in clarifying my idea.

Thanks in advance.
Ahmed

1 Like

If you want to compare the degree of overlap in relationship sequences, then you could check out the BLAST algorithm from bioinformatics (BLAST (biotechnology) - Wikipedia)

You can also represent your graphs as contingency matrices and compare those using a singular value decomposition of the contingency matrices for example. That doesn't take into account the relationship sequences however.

Yet another idea is to do a graph 2 vec (DeepWalk: Implementing Graph Embeddings in Neo4j). This allows you to compare nodes based on their contextual information (i.e. relationship neighborhood similarity).

Just some ideas...

1 Like

Great ideas @niclasko
Thank you very much.
I will try them and will put my feedback here.

1 Like

A new graph algorithm was released yesterday that might help you with your usecase:

The Node Similarity algorithm compares a set of nodes based on the nodes they are connected to. Two nodes are considered similar if they share many of the same neighbors. Node Similarity computes pair-wise similarities based on the Jaccard metric.

2 Likes

That's great!
Thank you very much, shan
I am gonna use it for sure.

1 Like

Use geometric deep learning to correctly classify sub-graph patterns.

https://docs.dgl.ai/tutorials/basics/4_batch.html

Once classified, you just search for any sort of subgraph pattern and you are effectively done.

Beyond that, you can perform link/relation prediction with DGL to figure out which molecule is most likely to change state e.g. changing one H node to C node int order to bind to another molecule depending on its context.

https://docs.dgl.ai/tutorials/basics/1_first.html

In case you are short on structures to classify, you can just generate molecules structures similar but slightly different to your target structure by using GAN's.

3 Likes

Awesome!
Very useful resources and ideas.
Thank you very much marvin-hansen.
I will back with my feedback after trying them.

@ahmedmelmoselhy - if you're working in the chemistry space, check out the neo4j/rdkit integration project: GitHub - rdkit/neo4j-rdkit

That should let you do structure/substructure and similarity searches using RDKit's framework.

Neo4j doesn't ship any out-of-the-box graph isomorphism or graph edit distance algorithms, but the RDKit tools should get you where you need to be :slight_smile:

1 Like

okay. :slightly_smiling_face:
I am trying the other mentioned techniques and may use it as well.
thanks alicia.frame
I may also try to implement some common subgraph algorithm and hope it will work.

Also you could try put the properties of the atoms in the relationships and use it as a weights. And use something like page rank