Exact match - Check for duplicate nodes / check for duplicate relationships

Joel · December 6, 2018, 1:09am

I am working to create a few cypher queries to check for previously encountered issues as new datasets are added. (issue may be caused by the data and/or load scripts). These checks are meant to be generic, and of general use (for us) but might not be applicable to everyone's situation? I expect I'm not the first to try to write these tests, so I'm interested to know if these can/should be improved (e.g. situations they won't work as expected), is there a better way, and/or with better performance?

Environment: Neo4j 3.5-Enterprise

Test 1: Count duplicate nodes.
Definition: Two nodes have exactly the same labels, and properties (keys and values match exactly)
Expected: 0

// count duplicate nodes
MATCH (a)
with labels(a) as la, properties(a) as p, count(properties(a)) as cpr
where cpr>1
return sum(cpr-1) as numDuplicateNodes

Test 2: Count duplicate relationships
Definition: For any a-[r]->b, there are two r, (same direction), with the same type, and properties (keys and values match exactly)
Expected: 0

// count duplicate relationships
MATCH (a)-[r]->(b)
with a, b, type(r) as tr, properties(r) as pr, count(properties(r)) as cpr
where cpr>1
return sum(cpr-1) as numDuplicateRelationships

----- in order to test the queries, currently I have to manually create the issues in a dev database.
The cypher I use to create the issues in the dev database may also be of interest. I know these would need to be redesigned if a database was very large. I'm working with less than million nodes, so they are fast enough in my situation.

// create random duplicate nodes
match (a)
with a, rand() as r
order by r asc
with a, r LIMIT 10
with properties(a) as pa, labels(a) as la
create (b) set b=pa
with b, la
CALL apoc.create.addLabels( [ id(b) ], la) YIELD node
return ID(node)

// create random duplicate relationships
match (a)-[r]->(b)
with a, r, b, rand() as rnd
order by rnd asc
with a, r, b LIMIT 10
with a,b, r, type(r) as tr, properties(r) as pr
call apoc.create.relationship(a, tr, pr, b) YIELD rel
return count(rel)

Topic		Replies	Views
Multiple relationships between two nodes all become duplicates Neo4j Graph Platform relationship	3	3528	March 28, 2019
Not detecting repeated nodes Neo4j Graph Platform migrated	7	257	January 20, 2023
Delete duplicate nodes if they have a relationship to the same node Cypher	11	357	May 25, 2022
Duplicate checking Cypher cypher	2	582	March 21, 2023
Help me write a Cypher query that returns nodes with duplicate outgoing relationships Cypher	12	2825	March 16, 2020

Take the Course Then Join The Aura Agent Hackathon

Exact match - Check for duplicate nodes / check for duplicate relationships

Related topics

Take the Course Then Join
The Aura Agent Hackathon