How to correctly create and count new relationships to avoid duplicates?

awpmedusa22 · January 28, 2025, 4:59pm

I’m working with Neo4j and trying to create CONNECTED_TO relationships between a Task node and Node nodes based on specific conditions.
Here’s the original query that creates the relationships correctly:

MATCH (task:Job {id:$taskId})
MATCH path = (task)-[:STARTS_WITH]->(initialNode:Node)
      (()<-[:PREVIOUS_NODE]-(nextNode:Node) WHERE NOT EXISTS { (nextNode)<-[:RESOLVED_FOR]-(task) })*
      (relatedNode:Node)
WITH task, relatedNode
MERGE (task)-[:CONNECTED_TO]->(relatedNode)

UNION

MATCH path = (task)-[:RESOLVED_FOR]->(resolvedNode:Node)
      (()-[:PREVIOUS_NODE]->(upperNode:Node) WHERE NOT EXISTS { (upperNode)<-[:STARTS_WITH | CONNECTED_TO | RESOLVED_FOR]-(task) })*
      (relatedNode:Node)
WHERE relatedNode <> resolvedNode AND upperNode <> resolvedNode
WITH task, relatedNode
MERGE (task)-[:CONNECTED_TO]->(relatedNode)

I modified the query to return the count of created relationships:

MATCH (task:Job {id: $taskId})
CALL {
    WITH task
    MATCH (task)-[:STARTS_WITH]->(initialNode:Node)
    (()<-[:PREVIOUS_NODE]-(nextNode:Node) WHERE NOT EXISTS { (nextNode)<-[:RESOLVED_FOR]-(task) })*
    (relatedNode:Node)
    RETURN relatedNode
    UNION
    MATCH (task)-[:RESOLVED_FOR]->(resolvedNode:Node)
    (()-[:PREVIOUS_NODE]->(upperNode:Node) WHERE NOT EXISTS { (upperNode)<-[:STARTS_WITH | CONNECTED_TO | RESOLVED_FOR]-(task) })*
    (relatedNode:Node)
    WHERE relatedNode <> resolvedNode AND upperNode <> resolvedNode
    RETURN relatedNode
}
WITH task, relatedNode
WHERE NOT EXISTS ((task)-[:CONNECTED_TO]->(relatedNode))
CREATE (task)-[:CONNECTED_TO]->(relatedNode)
RETURN COUNT(*) as num_of_relationships_created

However, the new query starts creating significantly more relationships than needed, and the result becomes incorrect.

My Goal:

Ensure that only those CONNECTED_TO relationships that don’t already exist are created.
Accurately count the number of new relationships created.

Question:
Why does the modified query create more relationships than necessary, and how can I fix this?

christopher.rost · February 11, 2025, 3:37pm

Hi! Have you tried the PROFILE clause to get an execution plan of both queries and comparing the results returned by the sub-clauses?

In your modified query, you do not avoid duplicates, so maybe try adding a DISTINCT to each of the two inner RETURN statements. The UNION can also result in duplicates, so you have to DISTINCT the result of the CALL again, e.g. by

WITH task, COLLECT(DISTINCT relatedNode) AS relatedNodes
UNWIND relatedNodes AS relatedNode
WITH task, relatedNode

Can you try this + checking the PROFILE result?

Topic		Replies	Views
How to count the number of relationships created in the following Cypher query? Cypher relationship , count	4	102	January 28, 2025
Duplicated (gray) nodes when creating new relationship Newbie Questions	2	680	June 22, 2020
Neo4j query for relationship Cypher	4	5755	December 2, 2019
Create relationship between two existing nodes without introducing new nodes Cypher performance , cypher , import	3	8943	December 15, 2019
Dynamically Create Nodes & Relationships Cypher	5	972	November 22, 2023

Get Certified in June!

How to correctly create and count new relationships to avoid duplicates?

Related topics