Creating virtual relationships between nodes

I searched everywhere for a solution plus stared at this problem for so long, and i still cannot figure why apoc.create.vRelationship does not create edges correctly. Created a toy example to demonstrate the issue. I have product/company graph where Products are connected like "Product p2 NEEDED_IN Product p1" and "Company C1 DEVELOPES Product p1". In the example above "Company C2 DEVELOPES Product p2", then "Company C2 SUPPLIES_PRODUCT to C1". I am trying to create temporary relationship "Company C2 SUPPLIES_PRODUCT to C1" between Companies whose Products depends on each other via apoc.create.vRelationship.

Data set

CREATE (C1:Company{name : 'C1'})
CREATE (C2:Company{name : 'C2'})
CREATE (C3:Company{name : 'C3'})

CREATE (p1:Product{PRODUCT_NAME:'p1'})
CREATE (p2:Product{PRODUCT_NAME:'p2'})
CREATE (p3:Product{PRODUCT_NAME:'p3'})
CREATE (p4:Product{PRODUCT_NAME:'p4'})
CREATE (p5:Product{PRODUCT_NAME:'p5'})
CREATE (p6:Product{PRODUCT_NAME:'p6'})

CREATE (p2)-[:NEEDED_IN]->(p1)
CREATE (p3)-[:NEEDED_IN]->(p1)
CREATE (p4)-[:NEEDED_IN]->(p2)
CREATE (p5)-[:NEEDED_IN]->(p2)
CREATE (p6)-[:NEEDED_IN]->(p2)


Cypher Query

MATCH (C2:Company)-[:DEVELOPES]->(p2)-[:NEEDED_IN]->(p1)<-[:DEVELOPES]-(C1:Company{name : 'C1'})
MATCH (C3:Company)-[:DEVELOPES]->(p3)-[:NEEDED_IN]->(p2)
WITH DISTINCT C1, C2, C3, p1, p2, p3
RETURN C1, C2, C3, apoc.create.vRelationship(C2,'SUPPLIES_PRODUCT',{products_provided: collect(p2.PRODUCT_NAME), name: 'level1'} ,C1), apoc.create.vRelationship(C3,'SUPPLIES_PRODUCT',{products_provided: collect(p3.PRODUCT_NAME), name: 'level2'} ,C2)

The problem : I expect 2 edges coming out of Company C2 to Company C1. One edge for Product P2 and one for Product P3. However, that is not the case. I have an edge that includes duplicate Product P2, and Product P3 is entirely missing. apoc.create.vRelationship sometimes creates duplicate relationships, and sometimes misses a relationship. Any help is appreciated.

Versions: Neo4j 3.5

The problem is in your MATCH patterns.

Let's first look at why p3 isn't showing up as a result.

Here's the first line of your query:

MATCH (C2:Company)-[:DEVELOPES]->(p2)-[:NEEDED_IN]->(p1)<-[:DEVELOPES]-(C1:Company{name : 'C1'})

Only your p1 node fits the pattern and is bound to the p1 variable, since C1 develops it, and it has incoming relationships of type :NEEDED_IN from some node p2 developed by a company.

At this point, only nodes p3 and p2 will match to this pattern for the variable p2.

However your next MATCH adds a restriction on this:

MATCH (C3:Company)-[:DEVELOPES]->(p3)-[:NEEDED_IN]->(p2)

This means that the node for variable p2 must have an incoming :NEEDED_IN relationship from some node developed by a Company.

Node p3 in your graph does not have any incoming :NEEDED_IN relationships, so it is filtered out. This is why p3 is not in your graph, it doesn't fit the pattern you specified.

Node p2 however does fit the pattern, as it has incoming :NEEDED_IN relationships from nodes p4 (developed by C3), p5 and p6 (both developed by C1). You didn't have any other restrictions on variable C3 besides that it's a :Company node, so there are no problems with it matching to node C1 in your graph.

Thus your final graph:
Company C3 provides product p4 to company C2 (level 2 for product p2)
Company C1 provides products p5 and p6 to C2 (level 2 for product p2)
Company C2 provides product p2 to to C1 (level 1)

Why does p2 show up in the products list twice, and why are there two relationships for p2? We can see why if we look at your results mid-query after your matches:

MATCH (C2:Company)-[:DEVELOPES]->(p2)-[:NEEDED_IN]->(p1)<-[:DEVELOPES]-(C1:Company{name : 'C1'})
MATCH (C3:Company)-[:DEVELOPES]->(p3)-[:NEEDED_IN]->(p2)
│"C1"         │"C2"         │"C3"         │"p2.PRODUCT_NAME"│"p3.PRODUCT_NAME"│
│{"name":"C1"}│{"name":"C2"}│{"name":"C1"}│"p2"             │"p6"             │
│{"name":"C1"}│{"name":"C2"}│{"name":"C1"}│"p2"             │"p5"             │
│{"name":"C1"}│{"name":"C2"}│{"name":"C3"}│"p2"             │"p4"             │

When you aggregate as in your collect() usage, the non-aggregation variables become the grouping key. You're doing a lot at once here so it's harder for you to see what's actually going on. Here's a simplification, leaving out the relationship creation:

MATCH (C2:Company)-[:DEVELOPES]->(p2)-[:NEEDED_IN]->(p1)<-[:DEVELOPES]-(C1:Company{name : 'C1'})
MATCH (C3:Company)-[:DEVELOPES]->(p3)-[:NEEDED_IN]->(p2)
RETURN C1, C2, C3, collect(p2.PRODUCT_NAME) as p2Products, collect(p3.PRODUCT_NAME) as p3Products
│"C1"         │"C2"         │"C3"         │"p2Products"│"p3Products"│
│{"name":"C1"}│{"name":"C2"}│{"name":"C1"}│["p2","p2"] │["p6","p5"] │
│{"name":"C1"}│{"name":"C2"}│{"name":"C3"}│["p2"]      │["p4"]      │

In the first row, there are two p2 entries (in the first relationship from C2 to C1), because of the first two rows of the previous table, where C1 = C1, C2 = C2, and C3 = C1. To deduplicate, you can use COLLECT(DISTINCT p2) instead when you collect (and likewise for p3).

But you still have the second row, which will create a second relationship from C2 to C1 with a single p2 as the list entry.

The main point here is you're trying to do too much at once, the match patterns you're looking for are interfering with each other and messing up your aggregations.

A better approach here is to use a variable-length match pattern on the :NEEDED_IN relationships, and take the two product nodes the most distant from C1 (we're building a path backward from C1, so C1 is always at the end of the path, and we're expanding out and adding nodes to the start of the path) as the needed / needing pair, MATCH to their respective companies, aggregate the needed products using the companies as the grouping key, then create and return the vRels:

MATCH (prod)<-[:DEVELOPES]-(C1:Company{name : 'C1'})
MATCH path = ()-[:NEEDED_IN*]->(prod)
WITH length(path) as level, nodes(path)[0] as needed, nodes(path)[1] as needing
MATCH (supplier)-[:DEVELOPES]->(needed), (receiver)-[:DEVELOPES]->(needing)
WITH level, supplier, receiver, collect(DISTINCT needed.PRODUCT_NAME) as prods
RETURN supplier, receiver, apoc.create.vRelationship(supplier, 'SUPPLIES_PRODUCT', {products_provided:prods, level:level}, receiver) as rel

This works for the entire supply chain, not just up to level 2. You can limit this by adding an upper. bound on the :NEEDED_IN variable length relationship pattern.

Andrew thank you so much for your response. When i implement this on a large dataset, it is hard to see how far are the connections from the central node ( there are too many nodes close to each other). Is there a way to color code the relationships edges SUPPLIES_PRODUCT based on level:level , so the users can distinguish level1 from level2 connections ? e.g., all level 1 edges/relationships become red and level 2 relationships become blue. Even if i can do it for 3,4 levels that would be great.

There's not currently a way in the graph results view to color code based on a property. But there are two alternatives you may consider.

The first is to use different types for your levels, instead of just the :SUPPLES_PRODUCT relationship.

In your RETURN you might do something like:

RETURN supplier, receiver, apoc.create.vRelationship(supplier, 'SUPPLIES_LEVEL_' + level, {products_provided:prods, level:level}, receiver) as rel

That will let you color code based on different relationship types. In the graph results pane, at the top, under the query itself, will be bubbles for the labels in the view, and beneath that rectangles for the relationship types in the view. If you click on one of the relationship types, the bottom of the results pane you'll see options to select colors, line widths, and caption for those relationships. So not only can you select color and width as you like, you can also change the caption to show the level on the relationship instead of its type.