I am new to Cypher and work for a non-profit looking into financial crime. Most of our graph contains persons and entities and I want to check for duplicates. I tried the following simple query, but it returned everything
MATCH (a), (b)
WHERE a.name = b.name
How do I match for nodes with the exact same name property? How do I match for nodes where one name is contained in the other? For example one node is Mike Green and the other Mike Green Smith (Mike Green is contained completely in Mike Green Smith).
I really appreciate any advise you have! Just getting started and learning my way around.
This request will collect duplicates, thanks to a subquery, nodes for each name, the nodes which have the same name or the nodes which have a similar name:
MATCH (a)
CALL {
WITH a.name AS name
MATCH (b)
WHERE name =~ '(?i)' + b.name
WITH collect(b) AS nodes
CALL db.index.fulltext.queryNodes("node_name", name) YIELD node
RETURN name, collect(node) + nodes AS nodes
}
RETURN DISTINCT name, nodes
Thank you Cobra for your help. Unfortunately I am getting an error when I run this in Neo4j. It appears unhappy with the curly brackets and how the "name" was defined. Any thoughts on how to avoid these errors?
MATCH (a)
CALL {
WITH a
MATCH (b)
WHERE a.name =~ '(?i)' + b.name
WITH collect(b) AS nodes
CALL db.index.fulltext.queryNodes("node_name", a.name) YIELD node
RETURN a.name, collect(node) + nodes AS nodes
}
RETURN DISTINCT name, nodes
This query work with all Neo4j version but it requires APOC:
MATCH (a)
WITH a.name AS name
CALL apoc.cypher.run('
MATCH (b)
WHERE name =~ "(?i)" + b.name
WITH collect(b) AS nodes
CALL db.index.fulltext.queryNodes("node_name", name) YIELD node
RETURN name, collect(node) + nodes AS nodes
', {name:name})
YIELD value
RETURN