How to correctly create and count new relationships to avoid duplicates?

I’m working with Neo4j and trying to create CONNECTED_TO relationships between a Task node and Node nodes based on specific conditions.
Here’s the original query that creates the relationships correctly:

MATCH (task:Job {id:$taskId})
MATCH path = (task)-[:STARTS_WITH]->(initialNode:Node)
      (()<-[:PREVIOUS_NODE]-(nextNode:Node) WHERE NOT EXISTS { (nextNode)<-[:RESOLVED_FOR]-(task) })*
      (relatedNode:Node)
WITH task, relatedNode
MERGE (task)-[:CONNECTED_TO]->(relatedNode)

UNION

MATCH path = (task)-[:RESOLVED_FOR]->(resolvedNode:Node)
      (()-[:PREVIOUS_NODE]->(upperNode:Node) WHERE NOT EXISTS { (upperNode)<-[:STARTS_WITH | CONNECTED_TO | RESOLVED_FOR]-(task) })*
      (relatedNode:Node)
WHERE relatedNode <> resolvedNode AND upperNode <> resolvedNode
WITH task, relatedNode
MERGE (task)-[:CONNECTED_TO]->(relatedNode)

I modified the query to return the count of created relationships:

MATCH (task:Job {id: $taskId})
CALL {
    WITH task
    MATCH (task)-[:STARTS_WITH]->(initialNode:Node)
    (()<-[:PREVIOUS_NODE]-(nextNode:Node) WHERE NOT EXISTS { (nextNode)<-[:RESOLVED_FOR]-(task) })*
    (relatedNode:Node)
    RETURN relatedNode
    UNION
    MATCH (task)-[:RESOLVED_FOR]->(resolvedNode:Node)
    (()-[:PREVIOUS_NODE]->(upperNode:Node) WHERE NOT EXISTS { (upperNode)<-[:STARTS_WITH | CONNECTED_TO | RESOLVED_FOR]-(task) })*
    (relatedNode:Node)
    WHERE relatedNode <> resolvedNode AND upperNode <> resolvedNode
    RETURN relatedNode
}
WITH task, relatedNode
WHERE NOT EXISTS ((task)-[:CONNECTED_TO]->(relatedNode))
CREATE (task)-[:CONNECTED_TO]->(relatedNode)
RETURN COUNT(*) as num_of_relationships_created

However, the new query starts creating significantly more relationships than needed, and the result becomes incorrect.

My Goal:

  • Ensure that only those CONNECTED_TO relationships that don’t already exist are created.
  • Accurately count the number of new relationships created.

Question:
Why does the modified query create more relationships than necessary, and how can I fix this?

Hi! Have you tried the PROFILE clause to get an execution plan of both queries and comparing the results returned by the sub-clauses?

In your modified query, you do not avoid duplicates, so maybe try adding a DISTINCT to each of the two inner RETURN statements. The UNION can also result in duplicates, so you have to DISTINCT the result of the CALL again, e.g. by

WITH task, COLLECT(DISTINCT relatedNode) AS relatedNodes
UNWIND relatedNodes AS relatedNode
WITH task, relatedNode

Can you try this + checking the PROFILE result?