Creating relationship over several millions of nodes

  1. Negation is expensive
    (good formatting isn't) :stuck_out_tongue_winking_eye:

First, lest get this one running faster:

MATCH (i:Inventory), (p:Products)
WHERE p.p3 <> '0' AND  i.p1= p.p1 OR # this one is gonna *kill* your cpy and ram
WHERE p.p3 <> '0' AND i.p2= p.p2
RETURN i,p

So... let's make sure we only need to do that negation once, and group the rest of the WHERE clause more carefully

MATCH (p:Products)
WHERE p.p3 <> '0'

WITH p
MATCH (i :Inventory) 
WHERE i.p1= p.p1 OR i.p2= p.p2
RETURN i,p

...which will get you to a faster solution:

CALL apoc.periodic.iterate("
    MATCH (p:Products) WHERE p.p3 <> '0' WITH p
    MATCH (i :Inventory) WHERE i.p1= p.p1 OR i.p2= p.p2
    RETURN i,p
","
    MERGE (i) -[:PRODUCES]->(p)
",{batchSize:20000, parallel:false})

However, these would probably be better handled when importing data, or changes. When you're updating the inventory, the additional Cypher to adjust these relationships wouldn't have to check the entire DB. I strongly advise taking a very close look at whatever is creating/updating this data, as it will be orders of magnitude more efficient to do it then, instead of after the fact.

5 Likes