To merge similar named nodes in neo4j

kaveri.malviya · July 21, 2019, 8:33am

I was trying to merge nodes in neo4j based on its name and relationshiops. For example
a node 'entity' has names like "hongkong", "hnkg" and both of them have same relationships which means they are duplicate entries in the graph database.

Can the normal text functions in cypher like levenshtein distance,fuzzymatch or sorendice similarity be applied to find the similarity and merge nodes based on cutoff?

Or is it like graphware plugin is required to do this named entity resolution

ameyasoft · July 23, 2019, 8:35pm

Here is one solution.:

I created test nodes and relationship:
MERGE (a:Enty {name: "hongkong"})
MERGE (a1:Enty {name: "hnkg"})
MERGE (a2:Enty {name: "japan"})
MERGE (a3:Airport {name: "HKG"})

MERGE (a)-[:AIRPORT]->(a3)
MERGE (a1)-[:AIRPORT]->(a3)
MERGE (a2)-[:AIRPORT]->(a3);

Reslut:
mergnd

With apoc.text.jaroWinklerDistance one can identify all nodes that satisfy a certain condition when compared a.name = "hongkong". I setup a condition that jaroWinklerDistance >= 0.75

Test this with this script:

MATCH (a:Enty {name: "hongkong"})
MATCH (b:Enty) WHERE apoc.text.jaroWinklerDistance(a.name, b.name) >= 0.75 AND b.name <> a.name
RETURN *

Result:
mergnd2

Notice this does not show the node with name = "japan"

Now merge the similar nodes:

MATCH (a:Enty {name: "hongkong"})
MATCH (b:Enty) WHERE apoc.text.jaroWinklerDistance(a.name, b.name) >= 0.75 AND b.name <> a.name
WITH head(collect([a,b])) as nodes
CALL apoc.refactor.mergeNodes(nodes,{properties:"combine", mergeRels:true}) yield node
MATCH (n)-[:AIRPORT]->(c)
RETURN n, c;

Result:
mergnd3

Please test this in a test environment before you apply this to a production database. Hope this works for you.

Topic		Replies	Views
Matching near-duplicates? Cypher	2	267	May 3, 2021
Merge all nodes with the same property name Cypher	14	13422	January 9, 2021
Merge nodes within a larger graph on a given relation value Newbie Questions ruby , cypher	0	373	March 10, 2021
Neo4j Cypher query to quickly find nodes with similar text property value Cypher apoc , performance	8	3179	November 30, 2021
Compare nodes from one group without cartesian product Cypher apoc , performance , cypher	6	1454	September 23, 2019

Get Certified in June!

To merge similar named nodes in neo4j

Related topics