cancel
Showing results for 
Search instead for 
Did you mean: 

apoc.periodic.iterate for CREATE relation can not work on large data (500 million)

Peter_Lian
Node Clone

When my data contain 100 million nodes (half of which were named Material, others names Fgprodno with both property with label only), the following query : 

###

CALL apoc.periodic.iterate("MATCH(n:Material), (p:Fgprodno)  WHERE n.label = p.label RETURN n,p",

"CREATE (n)-[r:come_from]->(p)",

{batchSize:10000, parallel: true})

###

Can finished in 3 mins.

 

When my data contain 500 million nodes (the same data structure but larger size), the above query can not work even I try difference batchSize for it, the query just keep running but no anything result show and my CPU usage all reach about 90% for 44 thread (2 Sockets, 44 Cores, 88 Logical processors) . However, the following query 

###

CALL apoc.periodic.iterate(

'MATCH (n) RETURN id(n) AS id',  

'MATCH (n) WHERE id(n)=id DETACH DELETE n',

{batchSize: 10000, parallel: true });

###

Can be finish in 18 mins. (Of course my data contain only 500 million node) and 2.5 mins for 100 million nodes case (contain not only 100 million nodes but also 50 million relation)

 

This confused me a lot, they're all just apoc.periodic.iterate but the CREARE relation one only work for 100 million nodes but the latter one (delete) work for both 100 and 500 millions. Does there any time or memory complexity problem for the CREATE one? But my free disk space and free memory space still have over 200GB or even larger?   

Question :  Why the query which for CREATE relation can not work

or just HOW can I modified it to work? Thanks.

1 REPLY 1

bennu_neo
Neo4j
Neo4j

Hi @Peter_Lian,

Before going into more detail, have you tried with parallel: false?

Oh, y’all wanted a twist, ey?