I have a special requirement where I need to set up a single relationship out of three different node labels using 2 properties. I can achieve this requirement by running 3 different queries like below but I am failing when I try to achieve the same using a single query.
Query 1.
MATCH (A:NODE_1) WHERE A.p1 = ''
WITH A
MATCH (B:NODE_2) WHERE A.p2 = B.p2
MERGE (B)-[:REL]->(B)
Query 2.
MATCH (A:NODE_1) WHERE A.p2 = ''
WITH A
MATCH (C:NODE_3) WHERE A.p1 = C.p1
MERGE (A)-[:REL]->(C)
Query 3.
MATCH (A:NODE_1) WHERE A.p1 <> '' AND A.p2 <> ''
WITH A
MATCH (B:NODE_2) WHERE A.p2 = B.p2
MERGE (A)-[:REL]->(B)
A contains ~ 1M nodes and B and C contains 10K and 500 respectively. Can these 3 queries be merged together and run in a single query? If so, would someone please help? thanks!!!!
And I edited your query to a single query for Neo4j 3.5.
By the way, I thought the 3rd line was wrong, so I fixed it
from MERGE (B)-[:REL]->(B) to MERGE (A)-[:REL]->(B)
MATCH (A:NODE_1) WHERE A.p1 = ''
MATCH (B:NODE_2) WHERE A.p2 = B.p2
MERGE (A)-[:REL]->(B)
WITH 1 as dummy
MATCH (A:NODE_1) WHERE A.p2 = ''
MATCH (C:NODE_3) WHERE A.p1 = C.p1
MERGE (A)-[:REL]->(C)
WITH 1 as dummy
MATCH (A:NODE_1) WHERE A.p1 <> '' AND A.p2 <> ''
MATCH (B:NODE_2) WHERE A.p2 = B.p2
MERGE (A)-[:REL]->(B);
Thanks so much @koji!
I ran this query and it's been running for a while. Is there way I could use apoc.periodic.iterate to speed up the process with batches?
Thanks!
I think it's a good idea to add an index first, then use apoc.periodic.iterate.
for 3.x
CREATE INDEX ON :NODE_1(p1);
CREATE INDEX ON :NODE_1(p2);
CREATE INDEX ON :NODE_2(p2);
CREATE INDEX ON :NODE_3(p1);
for 4.x
CREATE INDEX node1p1 FOR (n:NODE_1) ON (n.p1);
CREATE INDEX node1p2 FOR (n:NODE_1) ON (n.p2);
CREATE INDEX node2p2 FOR (n:NODE_2) ON (n.p2);
CREATE INDEX node3p1 FOR (n:NODE_3) ON (n.p1);
Yes! I did set up 4 indexes this morning but still, it was running for long and I had to kill it. I think the reason is my Node_1 which has almost 1M nodes. That's why it's taking time to load everything to memory?
I used apoc.periodic.iterate like below previously for individual queries. I am just lost on how to use apoc.periodic.iterate for the single query you helped me put together.
call apoc.periodic.iterate("MATCH (A:NODE_1) WHERE A.p1 = '' WITH A
MATCH (B:NODE_2) WHERE A.p2 = B.p2
RETURN A,B",
"MERGE (B)-[:REL]->(B)",
{batchSize:10000,parallel:false})
CALL apoc.periodic.iterate("MATCH (X:NODE_1) WHERE X.p1 = ''
MATCH (B:NODE_2) WHERE X.p2 = B.p2
MATCH (Y:NODE_1) WHERE Y.p2 = ''
MATCH (C:NODE_3) WHERE Y.p1 = C.p1
MATCH (Z:NODE_1) WHERE Z.p1 <> '' AND Z.p2 <> ''
MATCH (D:NODE_2) WHERE Z.p2 = D.p2
RETURN X,Y,Z,B,C,D",
"MERGE (X)-[:REL]->(B)
MERGE (Y)-[:REL]->(C)
MERGE (Z)-[:REL]->(D)",
{batchSize:10000,parallel:false})
A note on this one, I see what you're intending (using the dummy to reset between each), but the problem arising from this is cardinality. The WITH 1 as dummy does not reset the cardinality. You would need to either use an aggregation or use WITH DISTINCT 1 as dummy to reset it properly in between.
Here's a knowledge base article on this:
Another approach you might consider...
You could use UNION ALL between the subqueries here (returning maybe a count(*) from each of them). This makes it easier to ensure the queries are isolated from each other, with no cardinality issues between them.