Jaccard and merging similar nodes

(Nael) #1

Hi everyone,
I hope you are enjoying Neo4j as I do.

I am working on a graph where I have nodes with similar relationships. To minimize the number of relationships in the graph, I am thinking to only keep one node of every similar group of nodes and connect all the other nodes to the first node with a direct relationship and change the label of the similar nodes to (:Synonym).

As an example: If I have 9 nodes that are similar to a certain node with a score of (1.0 on Jaccard). Assuming every node of these has 10 relationships, this means I have 10 nodes with 100 relationships. I want to relabel all these 9 nodes and only make them refer to the main node as Synonyms.

Anyone has done that before? Any idea how to code that?

Thank you


(Neo4j) #2

This is not a direct answer to your question, but it's an answer to a question you may not have known to ask. You can now do 'projections' of your graph. So you could keep the granular detail of the data presently captured in your graph, but to display it, you can go beyond clustering but compressing each cluster into a single node. It's only for reporting; it doesn't alter the data in your graph. It works the same way as the command "call :schema" but you can aggregate all nodes with similar characteristics into a single node and provide a count on that node, or a sum, or average value....
Check it out here: http://neo4j-contrib.github.io/neo4j-apoc-procedures/3.5/virtual/#_nodes_collapse