To merge similar named nodes in neo4j

I was trying to merge nodes in neo4j based on its name and relationshiops. For example
a node 'entity' has names like "hongkong", "hnkg" and both of them have same relationships which means they are duplicate entries in the graph database.

Can the normal text functions in cypher like levenshtein distance,fuzzymatch or sorendice similarity be applied to find the similarity and merge nodes based on cutoff?

Or is it like graphware plugin is required to do this named entity resolution

Here is one solution.:

I created test nodes and relationship:
MERGE (a:Enty {name: "hongkong"})
MERGE (a1:Enty {name: "hnkg"})
MERGE (a2:Enty {name: "japan"})
MERGE (a3:Airport {name: "HKG"})

MERGE (a)-[:AIRPORT]->(a3)
MERGE (a1)-[:AIRPORT]->(a3)
MERGE (a2)-[:AIRPORT]->(a3);

Reslut:
mergnd

With apoc.text.jaroWinklerDistance one can identify all nodes that satisfy a certain condition when compared a.name = "hongkong". I setup a condition that jaroWinklerDistance >= 0.75

Test this with this script:

MATCH (a:Enty {name: "hongkong"})
MATCH (b:Enty) WHERE apoc.text.jaroWinklerDistance(a.name, b.name) >= 0.75 AND b.name <> a.name
RETURN *

Result:
mergnd2

Notice this does not show the node with name = "japan"

Now merge the similar nodes:

MATCH (a:Enty {name: "hongkong"})
MATCH (b:Enty) WHERE apoc.text.jaroWinklerDistance(a.name, b.name) >= 0.75 AND b.name <> a.name
WITH head(collect([a,b])) as nodes
CALL apoc.refactor.mergeNodes(nodes,{properties:"combine", mergeRels:true}) yield node
MATCH (n)-[:AIRPORT]->(c)
RETURN n, c;

Result:
mergnd3

Please test this in a test environment before you apply this to a production database. Hope this works for you.

1 Like