Not detecting repeated nodes

Reuben · January 20, 2023, 4:48am

My cypher query is not able to detect duplicates / repeated nodes under different labels. The output it gives me is –> (no changes, no records)

MATCH (n)
WHERE n.name = "Joining"
WITH n, COUNT(n) as count
WHERE count > 1
RETURN n

However, when search for the same node as an individual entity using this query below gives me all the available duplicates under the different labels.

MATCH (n)
WHERE n.name = "Joining"
// WITH n, COUNT(n) as count
// WHERE count > 1
RETURN n

Please can anyone explain why and suggest how best to go about it? Thanks

#Neo4J #Cypher #nodeduplicates

@glilienfield

Reuben · January 20, 2023, 6:18am

Thank you as usual @glilienfield

Reuben · January 20, 2023, 5:15am

Is there a way to look for duplicate nodes in general? Something like this?

// this doesn't work though*
MATCH (n)
WITH n, COUNT(n) as count
WHERE count > 1
WITH COLLECT(n) as nodes
RETURN nodes

Reuben · January 20, 2023, 5:00am

using the collect approach worked:

MATCH (n)
WHERE n.name = "specific name"
WITH COLLECT(n) as nodes
WHERE SIZE(nodes) > 1
RETURN nodes

glilienfield · January 20, 2023, 4:57am

Simple mistake, you are grouping by 'n', which is the node. The result will be a separate row for each n and a corresponding count of one.

What you want to do is group on the common value, which is n.name.

MATCH (n)
WHERE n.name = "Joining"
WITH n.name as name, COLLECT(n) as duplicates
WHERE size(duplicates) > 1
RETURN duplicates

glilienfield · January 20, 2023, 6:02am

You need to group the nodes by their duplicate values. Assume that you define two nodes as duplicates if their duplicateProperty1 and duplicateProperty2 values are equal. For example, two Person nodes are duplicate if they have the same email address and phone numbers, something like that. Using properties duplicateProperty1 and duplicateProperty2 as the values that define a duplicate, the following query will group the nodes that have the same values and return their node ids in a list. You can return what you need instead.

match(n)
with n.duplicateProperty1 as property1, n.duplicateProperty2 as property2, collect(id(n)) as ids
where size(ids) > 1
return property1, property2, ids

Test data:

Screen Shot 2023-01-20 at 12.58.26 AM.png

Result of grouping by both properties:

Screen Shot 2023-01-20 at 12.58.51 AM.png

glilienfield · January 20, 2023, 5:02am

You know you are correct, the 'with c.name' as I did is not necessary since the value is the same for all nodes. As such, it can be removed, which is the result you gave.

Reuben · January 20, 2023, 5:34am

Thanks so much, you are a life saver. how about a general scenario to check duplicates as I illustrated below?

Topic		Replies	Views
Searching for Duplicates with CYPER match on properties Cypher	12	2040	August 17, 2020
Nodes with duplicate property value Cypher browser	2	330	March 2, 2022
Count the number of duplicates in result Neo4j Graph Platform migrated	2	110	June 15, 2022
Lowest impact way of finding duplicate nodes Cypher	1	594	August 15, 2019
Exact match - Check for duplicate nodes / check for duplicate relationships Cypher apoc , performance , cypher	0	8234	December 6, 2018

Not detecting repeated nodes

Related topics