Hello,
I want to import 2 separate graphs into Neo4j and then to find the similarity between them. How to do that?
I had a look at the Jaccard Similarity procedure and it seems that the 2 graphs should be connected in some way.
But what if they are not?
Also, I have found that one of the algorithms used for graph similarity/ isomorphism is Maximum Common Subgraph. Is there a way to use it in Neo4j?
You need to share some elements between the two graph for example if you're talking about customers and e-costumers there are some properties that share like gender, zip, contact number. etc
maybe you could share a little more of info about the graphs and the model that you want to do.
Can you explain a bit more about what you mean by similarity between 2 graphs?
Those OOTB similarity algorithms (e.g., jaccard) are meant to be used to find similarity between nodes. You might be able to use them for other purposes though.
Thanks Roberto for the reply and for the document.
My goal is to have two separate graphs (and I know previously that there similar ) and I want to calculate the overall similarity or change that happens from one to another.
Example:
Here, I have a chemical compound 1 with nodes as (H,C, etc) and edges as bonds (Ionic, etc). which means I have heterogeneity in nodes and edges. Then I have another chemical compound 2 (assumably) almost similar, with small changes ( e.g. changing one H node to C node, same as for the edges)
My goal is to compare these two standalone graphs and to come up with a percentage of similarity.
Thanks shan for your reply.
By the similarity between 2 graphs: can be thoughts also as is I have two identical standalone graphs and then I tweaked some nodes/edges here and there. So, I want to measure how similar this new (tweaked) version of the graph to the original one.
I have also mentioned an example in my second reply above.
If you want to compare the degree of overlap in relationship sequences, then you could check out the BLAST algorithm from bioinformatics (BLAST (biotechnology) - Wikipedia)
You can also represent your graphs as contingency matrices and compare those using a singular value decomposition of the contingency matrices for example. That doesn't take into account the relationship sequences however.
Yet another idea is to do a graph 2 vec (DeepWalk: Implementing Graph Embeddings in Neo4j). This allows you to compare nodes based on their contextual information (i.e. relationship neighborhood similarity).
A new graph algorithm was released yesterday that might help you with your usecase:
The Node Similarity algorithm compares a set of nodes based on the nodes they are connected to. Two nodes are considered similar if they share many of the same neighbors. Node Similarity computes pair-wise similarities based on the Jaccard metric.
Once classified, you just search for any sort of subgraph pattern and you are effectively done.
Beyond that, you can perform link/relation prediction with DGL to figure out which molecule is most likely to change state e.g. changing one H node to C node int order to bind to another molecule depending on its context.
In case you are short on structures to classify, you can just generate molecules structures similar but slightly different to your target structure by using GAN's.
okay.
I am trying the other mentioned techniques and may use it as well.
thanks alicia.frame
I may also try to implement some common subgraph algorithm and hope it will work.