Unique relation to a virtual node

Hey Community!

I'm looking to create a Virtual Node which combines all the properties, inbound and outbound relationships of two nodes.

I've made a simplified dataset to illustrate my challenge:

CREATE (sub1:Sub {attr1: "1"})
CREATE (sub2:Sub {attr2: "2"})
CREATE (value1:Val {valueAttr1: "1"})
CREATE (value2:Val {valueAttr2: "2"})
CREATE (value3:Val {valueAttr3: "3"})
CREATE (vc1:VC)-[:subRel]->(sub1)-[:value]->(value1)
CREATE (vc2:VC)-[:subRel]->(sub2)-[:value]->(value2)
MERGE (vc1)-[:subRel]->(sub1)-[:value]->(value1)
MERGE (vc2)-[:subRel]->(sub2)-[:value]->(value2)
MERGE (sub2)-[:value]->(value3)

This is what I have been trying:

MATCH (n:Sub)

WITH collect(n) AS nodes
WITH apoc.map.mergeList([node IN nodes | apoc.any.properties(node)]) AS mergedProps, nodes

CALL apoc.create.vNode(['vSub', 'VirtualNode'], mergedProps) YIELD node AS virtualNode

WITH virtualNode, nodes
UNWIND nodes AS n
MATCH (n)-[rOut]->(relatedOutboundNode)
WITH virtualNode, n, nodes, rOut, relatedOutboundNode
CALL apoc.create.vRelationship(virtualNode, type(rOut), apoc.any.properties(rOut), relatedOutboundNode) YIELD rel AS vOutRel

WITH DISTINCT virtualNode, vOutRel, relatedOutboundNode, rOut, n, nodes
MATCH (n)<-[rIn]-(relatedInboundNode)
WITH virtualNode, rIn, relatedOutboundNode, nodes, relatedInboundNode, vOutRel, rOut
WHERE NOT (relatedInboundNode)-[:subRel]->(virtualNode)  
CALL apoc.create.vRelationship(relatedInboundNode, type(rIn), apoc.any.properties(rIn), virtualNode) YIELD rel AS vInRel

RETURN virtualNode, rOut, rIn, relatedOutboundNode, nodes, vOutRel, relatedInboundNode, vInRel

My problem is that the two : Val nodes makes my query duplicate the :subRel relations. I need to "carry forward" (WITH) the outbound relationships and nodes in order to have them part of the RETURN, and that messes up my attempts to only make a unique virtual :subRel.

Here's a screenshot with the virtual node highlighted in the middle:
image
I'm looking for a way to only have the one subRel, corresponding to (vc2)-[:subRel]->(sub2), irrespective of how many :Val nodes there are.

Any ideas how I can tweak my query will be super appreciated!
Nis

Update: I managed to explain to an AI robot what I was after and help me with the syntax (how to collect and subsequently unwind). I'm pretty sure this does what I am looking for:

MATCH (n:Sub)

WITH collect(n) AS nodes
WITH apoc.map.mergeList([node IN nodes | apoc.any.properties(node)]) AS mergedProps, nodes

CALL apoc.create.vNode(['vSub', 'VirtualNode'], mergedProps) YIELD node AS virtualNode

WITH virtualNode, nodes
UNWIND nodes AS n
MATCH (n)-[rOut]->(relatedOutboundNode)
CALL apoc.create.vRelationship(virtualNode, type(rOut), apoc.any.properties(rOut), relatedOutboundNode) YIELD rel AS vOutRel

WITH virtualNode, n, nodes, collect(relatedOutboundNode) AS relatedOutboundNodes, collect(rOut) AS rOuts, collect(vOutRel) AS vOutRels

// Handle inbound relationships with DISTINCT filtering
MATCH (n)<-[rIn]-(relatedInboundNode)
WITH DISTINCT virtualNode, rIn, nodes, relatedInboundNode, relatedOutboundNodes, rOuts, vOutRels

WHERE NOT (relatedInboundNode)-[:subRel]->(virtualNode)
CALL apoc.create.vRelationship(relatedInboundNode, type(rIn), apoc.any.properties(rIn), virtualNode) YIELD rel AS vInRel

// Unwind the collected outbound relationships
WITH virtualNode, rIn, nodes, relatedInboundNode, vInRel, relatedOutboundNodes, rOuts, vOutRels
UNWIND range(0, size(relatedOutboundNodes) - 1) AS idx
WITH virtualNode, rIn, nodes, relatedInboundNode, vInRel, relatedOutboundNodes[idx] AS relatedOutboundNode, rOuts[idx] AS rOut, vOutRels[idx] AS vOutRel

RETURN virtualNode, rOut, rIn, relatedOutboundNode, nodes, vOutRel, relatedInboundNode, vInRel

The repetitive behavior is due to the creating a cartesian product of results between the two match statements following the unwind. What is happening is the distinct passes two rows with the same 'n', one for each of the outgoing relationships for the given 'n'. This then causes the following match after the distinct to execute for each outgoing relationships for the same 'n'. Because the same value of 'n' is passed multiple times to the second match you get multiple identical relationships.

There are several approaches. I see you figured out one, which is to collect the results for each value of 'n' passed from the first phase of your query. Do this results in only one row per node 'n' to be passed since the multiple relationships for the given 'n' have been collected.

Another approach, which seems cleaner, is to use a Union , where on part of the query creates the outgoing relationships and the other creates the incoming relationships. This works because each query can be written to returnithe same columns.

Result from first query, with the original nodes and relationships removed to focus on the virtual node and relationships:

Refactored query using UNION approach:

MATCH (n:Sub)
WITH collect(n) AS nodes
WITH apoc.map.mergeList([node IN nodes | apoc.any.properties(node)]) AS mergedProps, nodes
CALL apoc.create.vNode(['vSub', 'VirtualNode'], mergedProps) YIELD node AS virtualNode
CALL {
        WITH virtualNode, nodes
        UNWIND nodes AS n
        MATCH (n)-[r]->(relatedNode)
        WITH virtualNode, n, r, relatedNode
        CALL apoc.create.vRelationship(virtualNode, type(r), apoc.any.properties(r), relatedNode) YIELD rel AS vRel
        RETURN n, vRel, relatedNode
    UNION
        WITH virtualNode, nodes
        UNWIND nodes AS n
        MATCH (n)<-[r]-(relatedNode)
        WITH virtualNode, n, r, relatedNode
        CALL apoc.create.vRelationship(relatedNode, type(r), apoc.any.properties(r), virtualNode) YIELD rel AS vRel
        RETURN n, vRel, relatedNode
}
RETURN virtualNode, relatedNode, vRel
1 Like

Yes! This is definitely cleaner, it's much clearer what is going on. Thanks so much for responding, @glilienfield, this was very insightful!

1 Like