cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! Site migration is underway. Expect disruption to service on Thursday, Feb. 9!

Not detecting repeated nodes

Reuben
Graph Buddy

My cypher query is not able to detect duplicates / repeated nodes under different labels. The output it gives me is –> (no changes, no records)

 

MATCH (n)
WHERE n.name = "Joining"
WITH n, COUNT(n) as count
WHERE count > 1
RETURN n

 

However, when search for the same node as an individual entity using this query below gives me all the available duplicates under the different labels.

 

MATCH (n)
WHERE n.name = "Joining"
// WITH n, COUNT(n) as count
// WHERE count > 1
RETURN n

 

Please can anyone explain why and suggest how best to go about it? Thanks 

#Neo4J #Cypher #nodeduplicates 

@glilienfield 

3 ACCEPTED SOLUTIONS

glilienfield
Ninja
Ninja

Simple mistake, you are grouping by 'n', which is the node.  The result will be a separate row for each n and a corresponding count of one. 

What you want to do is group on the common value, which is n.name. 

MATCH (n)
WHERE n.name = "Joining"
WITH n.name as name, COLLECT(n) as duplicates
WHERE size(duplicates) > 1
RETURN duplicates

 

View solution in original post

You know you are correct, the 'with c.name' as I did is not necessary since the value is the same for all nodes. As such, it can be removed, which is the result you gave. 

View solution in original post

You need to group the nodes by their duplicate values. Assume that you define two nodes as duplicates if their duplicateProperty1 and duplicateProperty2 values are equal. For example, two Person nodes are duplicate if they have the same email address and phone numbers, something like that. Using properties duplicateProperty1 and duplicateProperty2 as the values that define a duplicate, the following query will group the nodes that have the same values and return their node ids in a list. You can return what you need instead. 

 

match(n)
with n.duplicateProperty1 as property1, n.duplicateProperty2 as property2, collect(id(n)) as ids
where size(ids) > 1
return property1, property2, ids

 

Test data:

 Screen Shot 2023-01-20 at 12.58.26 AM.png

Result of grouping by both properties:

Screen Shot 2023-01-20 at 12.58.51 AM.png

View solution in original post

7 REPLIES 7

glilienfield
Ninja
Ninja

Simple mistake, you are grouping by 'n', which is the node.  The result will be a separate row for each n and a corresponding count of one. 

What you want to do is group on the common value, which is n.name. 

MATCH (n)
WHERE n.name = "Joining"
WITH n.name as name, COLLECT(n) as duplicates
WHERE size(duplicates) > 1
RETURN duplicates

 

Reuben
Graph Buddy

using the collect approach worked: 

MATCH (n)
WHERE n.name = "specific name"
WITH COLLECT(n) as nodes
WHERE SIZE(nodes) > 1
RETURN nodes

 

You know you are correct, the 'with c.name' as I did is not necessary since the value is the same for all nodes. As such, it can be removed, which is the result you gave. 

Thanks so much, you are a life saver. how about a general scenario to check duplicates as I illustrated below?

Reuben
Graph Buddy

Is there a way to look for duplicate nodes in general? Something like this?

// this doesn't work though*
MATCH (n)
WITH n, COUNT(n) as count
WHERE count > 1
WITH COLLECT(n) as nodes
RETURN nodes

You need to group the nodes by their duplicate values. Assume that you define two nodes as duplicates if their duplicateProperty1 and duplicateProperty2 values are equal. For example, two Person nodes are duplicate if they have the same email address and phone numbers, something like that. Using properties duplicateProperty1 and duplicateProperty2 as the values that define a duplicate, the following query will group the nodes that have the same values and return their node ids in a list. You can return what you need instead. 

 

match(n)
with n.duplicateProperty1 as property1, n.duplicateProperty2 as property2, collect(id(n)) as ids
where size(ids) > 1
return property1, property2, ids

 

Test data:

 Screen Shot 2023-01-20 at 12.58.26 AM.png

Result of grouping by both properties:

Screen Shot 2023-01-20 at 12.58.51 AM.png

Reuben
Graph Buddy

Thank you as usual @glilienfield