How do I optimize my merge relation?


(Tanmoymoulik) #1

In my project.At 8 hour interval we have load nodes in in ne04j DB from different source.for example we load lable (:CALte) similary we load (:POLte).These could be lakhs in count.Also card and port have joining condition of attibute CARD_CI_NAME=PARENT_CI_NAME.So we merge using the scirpt

call apoc.periodic.iterate("MATCH (caltea:CALte) , (poltea:POLte)
where caltea.CARD_CI_NAME = poltea.PARENT_CI_NAME return caltea,poltea","merge (caltea)-[r:Card2Port]->(poltea) return count(r)",
{batchSize:10000, iterateList:true, retries:3,parallel:true})
yield batches, total return batches, total;

Hoever since every 8 hour lakhs of nodes are uploaded relation has to be build between CALte and POLte.So evertime this command is executed the old relation is rebuild this addeds to perfomance and time for merging.How can we ignore of rebuilding relation is existing nodes which are aready merged.


(M. David Allen) #2

You can add a WHERE NOT clause to avoid matching ones whose relationship you have already created.

where NOT (caltea)-[:Card2Port]->(poltea) AND caltea.CARD_CI_NAME = poltea.PARENT_CI_NAME

Another approach is that when you merge the new relationship, you can also set a property on the caltea node { merged: true } and then use that as a filter to avoid those nodes. This may be even better. Something like:

CREATE INDEX ON :CALte(linked);
call apoc.periodic.iterate("MATCH (caltea:CALte) , (poltea:POLte)
where caltea.linked = false and caltea.CARD_CI_NAME = poltea.PARENT_CI_NAME return caltea,poltea",
"merge (caltea)-[r:Card2Port]->(poltea) SET caltea.linked = true return count(r)",
{batchSize:10000, iterateList:true, retries:3,parallel:true})
yield batches, total return batches, total;

(Tanmoymoulik) #3

Thanks for the solution.