Merge all nodes with the same property name

Joe123 · January 21, 2019, 12:43pm

Hi, I've a problem that I do not know how to code in cypher. I have duplicate nodes with the same property name, (n.name) and they have their own relationships. I wanted to match these nodes, merges the properties and relationships of the 2nd through last nodes onto the first node, and deletes the 2nd through last nodes. I have the code below and it works.

MATCH (n:name)
WHERE n.name = "john"
WITH COLLECT(n) AS ns
CALL apoc.refactor.mergeNodes(ns) YIELD node
RETURN node;

However, I do not want to input the name manually. Is it possible to do a for loop all the nodes that have the same property name and do the above code? like john, jack, jane ...

bratanic_tomaz · January 21, 2019, 2:40pm

to find nodes with the same property value

MATCH (n1:name),(n2:name)
WHERE n1.name = n2.name and id(n1) < id(n2)
WITH [n1,n2] as ns
CALL apoc.refactor.mergeNodes(ns) YIELD node
RETURN node

Joe123 · January 21, 2019, 4:05pm

id(n1) < id(n2)
don't quite get this and got syntax error
does the id refer to the internal id that neo4j created? i do not have id as property in the node

bratanic_tomaz · January 21, 2019, 4:17pm

Yes, this is internal id of Neo4j. I fixed the syntax error

Joe123 · January 21, 2019, 4:37pm

works! thanks alot!!!!!

i.m · January 29, 2019, 6:10am

Hello, I have actually been looking for a solution like that as well. For me it however leads to a Cartesian product? Is there a way to avoid that?

andrew_bowman · January 29, 2019, 6:47am

Yes, it will look similar to the query in the first post, except we'll collect with respect to name (for each row/name, we'll get the collection of nodes with that name) and filter to only rows where there are multiple nodes for that name:

MATCH (n:name)
WITH n.name as name, COLLECT(n) AS ns
WHERE size(ns) > 1
CALL apoc.refactor.mergeNodes(ns) YIELD node
RETURN node;

i.m · January 29, 2019, 11:32am

Brilliant, thanks a lot!!!

Joe123 · March 12, 2019, 2:41am

Hi, I've an error if the property of my nodes are uniquely indexed. Are there any work around with it?

New data does not satisfy CONSTRAINT ON ( label[0]:label[0] ) ASSERT label[0].property[0] IS UNIQUE.

Joe123 · March 12, 2019, 3:25am

Nvm solved it using this

CALL apoc.refactor.mergeNodes(nodes,{properties:"combine", mergeRels:true}) yield node

katircib · December 12, 2019, 6:15pm

Hi andrew, is it possible to create case insensitive merge query like this one?

andrew_bowman · December 12, 2019, 7:09pm

You mean case insensitive on the name property (or the common property used to identify duplicates), or case insensitive on the properties being merged?

If you want case insensitive on the name property, then it should be enough to use toLower() or toUpper() on that property at the time of the collection:

MATCH (n:name)
WITH toLower(n.name) as name, COLLECT(n) AS ns
WHERE size(ns) > 1
CALL apoc.refactor.mergeNodes(ns) YIELD node
RETURN node;

katircib · December 12, 2019, 7:41pm

Worked like a charm, thanks.

nwrpub · March 8, 2020, 2:39pm

Hi, I've got a very similar question, but I am unsure on how to solve it.
I have a pretty big database (>3 000 000 nodes) and I'm trying to merge nodes but those who have multiple similar properties only.

First option works ok, but it's doing a cartesian product, and I fear of running out of memory, or that it will take ages to complete.

I want to use second option, but I don't quite understand how "WITH ... as ... COLLECT" works.
Is this query correct ?

MATCH (n:Word) WITH toLower(n.spelling) as spelling AND toLower(n.pos) as pos AND toLower(n.language) as spelling, COLLECT(n) AS ns
WHERE size(ns) > 1
CALL apoc.refactor.mergeNodes(ns) YIELD node
RETURN node;

I hope this is an appropriate place to ask my question. Thank you for ur help
Please excuse my fragile english

clem · January 9, 2021, 2:32am

This is my solution that avoids the Cartesian product.

The trick is to use CALL to force a first MATCH to completion without involving a second MATCH in a Cartesian Product. This generates the duplicate nodes and a list of the property (names) values of those nodes that have been duplicated. The list of properties that are duplicate makes searching for the duplicates much faster.

The 3rd MATCH (outside of the CALL) is highly filtered so it's much faster. It gets the two duplicate nodes (that are different because they have different internal id's).

CALL {MATCH (n:Label)
WITH COLLECT(n.name) AS names  // I have to admit I don't know why this works.
WHERE size(names) > 1 
WITH collect(names[0]) as bnames // Makes a list of names that were duplicated but without duplicates
MATCH (b:Label) WHERE b.name in bnames
return b, bnames}
WITH b, bnames // b is duplicate nodes, bnames is list of names that were dupped

// b2 is the node that is a duplicate of b (by name)
MATCH (b2:Label) // get a second node
WHERE b2.name in bnames AND b2.name = b.name AND id(b) > id(b2) // fast match a duplicate
CALL apoc.refactor.mergeNodes([b, b2],
     {properties:"combine", mergeRels:true}) // merging details
YIELD node
RETURN node

I'm at the intermediate level, so this query could be improved.... For instance, I suspect that this test isn't needed:

b2.name in bnames

Anyway, this query ran faster than the other versions above.

I also Cloned my DB before running the query just in case something went awry...

I hope this helps.

Topic		Replies	Views
Merge all nodes with the same property name (case insensitive) Cypher	1	1041	December 12, 2019
Can we merge multiple Nodes based on Single common property Browser cypher	12	13073	March 4, 2020
apoc.refactor.mergeNodes Performance Neo4j Graph Platform apoc , performance , migrated , mergenode	6	375	November 28, 2022
Remove property duplicates Browser	2	357	March 25, 2020
Performance Issues Merging Nodes Cypher apoc , performance , cypher	3	361	March 13, 2022

Take the Course Then Join The Aura Agent Hackathon

Merge all nodes with the same property name

Related topics

Take the Course Then Join
The Aura Agent Hackathon