cancel
Showing results for 
Search instead for 
Did you mean: 

Performance Issues Merging Nodes

rezaahnadi99887
Node Link

Hi,
I have a database with 20 million nodes and 10 million relationships. I want to merge nodes that have the same code number property.

My cypher is like this


CALL apoc.periodic.iterate("
MATCH (n:Person) with distinct n.code as props return props
","
UNWIND props as prop
CALL{
WITH prop
MATCH (n:Person {code:prop}) 
with COLLECT(n) AS ns, count(n) as cn where cn > 1
CALL apoc.refactor.mergeNodes(ns, {properties:{OtherCodes:'combine', `.*`: 'overwrite'}}) 
YIELD node RETURN node as s
}WITH s
RETURN s;
", {batchSize:10000, parallel:true, iterateList:true});

But it does nothing, does not exist any errors, but it does not process

I use the 4.3.6 neo4j version

1 ACCEPTED SOLUTION

Remove the unwind from your 2nd statement.

I presume you have an index on :Person(code) ?

You don't need the subquery.

How many people with the same code do you have 10, 100, 10000 ?

CALL apoc.periodic.iterate("
MATCH (n:Person) return distinct n.code as prop
","
MATCH (n:Person {code:prop}) 
with prop, COLLECT(n) AS ns, count(n) as cn where cn > 1
CALL apoc.refactor.mergeNodes(ns, {properties:{OtherCodes:'combine', `.*`: 'overwrite'}}) 
YIELD node 
RETURN count(*)
", {batchSize:100, parallel:true, iterateList:true});```

View solution in original post

3 REPLIES 3

Remove the unwind from your 2nd statement.

I presume you have an index on :Person(code) ?

You don't need the subquery.

How many people with the same code do you have 10, 100, 10000 ?

CALL apoc.periodic.iterate("
MATCH (n:Person) return distinct n.code as prop
","
MATCH (n:Person {code:prop}) 
with prop, COLLECT(n) AS ns, count(n) as cn where cn > 1
CALL apoc.refactor.mergeNodes(ns, {properties:{OtherCodes:'combine', `.*`: 'overwrite'}}) 
YIELD node 
RETURN count(*)
", {batchSize:100, parallel:true, iterateList:true});```

Thanks for your answer
yes I have an index on Person(code)
And the number of people with the same code is about 100,000

I have a similar problem with relationships.
Because of the error "All Relationships must have the same start and end nodes.", I wrote a function in my plugin to categorize relationships by start and end, and it works fine.

My cypher is like this

CALL apoc.periodic.iterate("
MATCH (s:Person)-[r:Work]-(t:Office) WITH COLLECT(r) as lrs RETURN lrs
","
with customPlugin.relations.groupByStartAndEnd(lrs) as grs
UNWIND grs as gr
CALL apoc.refactor.mergeRelationships(gr) YIELD rel RETURN rel
", {batchSize:500, parallel:true, iterateList:true});

But when use apoc.refactor.mergeRelationships after a while, I see the following error in the logs

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "neo4j.Scheduler-1"