Single relationship over a 3 different node labels

karthik.shetty · August 31, 2020, 11:42pm

Hi!

I have a special requirement where I need to set up a single relationship out of three different node labels using 2 properties. I can achieve this requirement by running 3 different queries like below but I am failing when I try to achieve the same using a single query.

Query 1.

MATCH (A:NODE_1) WHERE A.p1 = ''
WITH A
MATCH (B:NODE_2) WHERE A.p2 = B.p2 
MERGE (B)-[:REL]->(B)

Query 2.

MATCH (A:NODE_1) WHERE A.p2 = ''
WITH A
MATCH (C:NODE_3) WHERE A.p1 = C.p1
MERGE (A)-[:REL]->(C)

Query 3.

MATCH (A:NODE_1) WHERE A.p1 <> '' AND A.p2 <> ''
WITH A
MATCH (B:NODE_2) WHERE A.p2 = B.p2
MERGE (A)-[:REL]->(B)

A contains ~ 1M nodes and B and C contains 10K and 500 respectively. Can these 3 queries be merged together and run in a single query? If so, would someone please help? thanks!!!!

koji · September 1, 2020, 7:33am

Hi,

I created test data.

CREATE (:NODE_1 {p1: '', p2: 'aa'}),
       (:NODE_1 {p1: 'bb', p2: ''}),
       (:NODE_1 {p1: 'cc', p2: 'dd'}),
       (:NODE_2 {p2: 'aa'}),
       (:NODE_2 {p2: 'bb'}),
       (:NODE_2 {p2: 'cc'}),
       (:NODE_2 {p2: 'dd'}),
       (:NODE_3 {p1: 'aa'}),
       (:NODE_3 {p1: 'bb'});

And I edited your query to a single query for Neo4j 3.5.

By the way, I thought the 3rd line was wrong, so I fixed it
from MERGE (B)-[:REL]->(B) to MERGE (A)-[:REL]->(B)

MATCH (A:NODE_1) WHERE A.p1 = ''
MATCH (B:NODE_2) WHERE A.p2 = B.p2
MERGE (A)-[:REL]->(B)
WITH 1 as dummy
MATCH (A:NODE_1) WHERE A.p2 = ''
MATCH (C:NODE_3) WHERE A.p1 = C.p1
MERGE (A)-[:REL]->(C)
WITH 1 as dummy
MATCH (A:NODE_1) WHERE A.p1 <> '' AND A.p2 <> ''
MATCH (B:NODE_2) WHERE A.p2 = B.p2
MERGE (A)-[:REL]->(B);

It's not good code, but it works.

karthik.shetty · September 1, 2020, 3:24pm

Thanks so much @koji!
I ran this query and it's been running for a while. Is there way I could use apoc.periodic.iterate to speed up the process with batches?
Thanks!

koji · September 1, 2020, 6:51pm

Hi,

I think it's a good idea to add an index first, then use apoc.periodic.iterate.

for 3.x

CREATE INDEX ON :NODE_1(p1);
CREATE INDEX ON :NODE_1(p2);
CREATE INDEX ON :NODE_2(p2);
CREATE INDEX ON :NODE_3(p1);

for 4.x

CREATE INDEX node1p1 FOR (n:NODE_1) ON (n.p1);
CREATE INDEX node1p2 FOR (n:NODE_1) ON (n.p2);
CREATE INDEX node2p2 FOR (n:NODE_2) ON (n.p2);
CREATE INDEX node3p1 FOR (n:NODE_3) ON (n.p1);

karthik.shetty · September 1, 2020, 7:09pm

Yes! I did set up 4 indexes this morning but still, it was running for long and I had to kill it. I think the reason is my Node_1 which has almost 1M nodes. That's why it's taking time to load everything to memory?

I used apoc.periodic.iterate like below previously for individual queries. I am just lost on how to use apoc.periodic.iterate for the single query you helped me put together.

call apoc.periodic.iterate("MATCH (A:NODE_1) WHERE A.p1 = '' WITH A
MATCH (B:NODE_2) WHERE A.p2 = B.p2 
RETURN A,B",
"MERGE (B)-[:REL]->(B)",
{batchSize:10000,parallel:false})

koji · September 1, 2020, 11:38pm

How about this one.

CALL apoc.periodic.iterate("MATCH (X:NODE_1) WHERE X.p1 = ''
MATCH (B:NODE_2) WHERE X.p2 = B.p2
MATCH (Y:NODE_1) WHERE Y.p2 = ''
MATCH (C:NODE_3) WHERE Y.p1 = C.p1
MATCH (Z:NODE_1) WHERE Z.p1 <> '' AND Z.p2 <> ''
MATCH (D:NODE_2) WHERE Z.p2 = D.p2
RETURN X,Y,Z,B,C,D",
"MERGE (X)-[:REL]->(B)
MERGE (Y)-[:REL]->(C)
MERGE (Z)-[:REL]->(D)",
{batchSize:10000,parallel:false})

andrew_bowman · September 1, 2020, 11:49pm

koji:

MATCH (A:NODE_1) WHERE A.p1 = ''
MATCH (B:NODE_2) WHERE A.p2 = B.p2
MERGE (A)-[:REL]->(B)
WITH 1 as dummy
MATCH (A:NODE_1) WHERE A.p2 = ''
MATCH (C:NODE_3) WHERE A.p1 = C.p1
MERGE (A)-[:REL]->(C)
WITH 1 as dummy
MATCH (A:NODE_1) WHERE A.p1 <> '' AND A.p2 <> ''
MATCH (B:NODE_2) WHERE A.p2 = B.p2
MERGE (A)-[:REL]->(B);

A note on this one, I see what you're intending (using the dummy to reset between each), but the problem arising from this is cardinality. The WITH 1 as dummy does not reset the cardinality. You would need to either use an aggregation or use WITH DISTINCT 1 as dummy to reset it properly in between.

Here's a knowledge base article on this:

Another approach you might consider...

You could use UNION ALL between the subqueries here (returning maybe a count(*) from each of them). This makes it easier to ensure the queries are isolated from each other, with no cardinality issues between them.

koji · September 2, 2020, 12:35am

@andrew_bowman
Thanks for your comment, I was initially going with this :)

WITH count(*) AS dummy

In this case, I thought your comment "UNION ALL" is good and easy to read.

Topic		Replies	Views
How to combine two different queries together into a single one Cypher	1	3522	February 13, 2019
Writing out performant queries for n-relationship queries on a single node where n > 2 Cypher	20	3626	October 14, 2019
Combining queries Cypher	5	554	January 9, 2020
Creating relationship over several millions of nodes Cypher apoc , performance , cypher , relationship	23	3084	September 24, 2020
Create relationship using if...else condition Cypher cypher	9	3043	August 27, 2020

Take the Course Then Join The Aura Agent Hackathon

Single relationship over a 3 different node labels

Related topics

Take the Course Then Join
The Aura Agent Hackathon