apoc.create.vRelationship and long paths

guido · April 24, 2023, 12:38pm

I have a simple query in NeoDash:

MATCH (n)<-[r:HAS_NAME_VARIANT {qualification: "preferred"}]-(p:Producer)-[r3:IS_RELATED_TO*]-(f:Family)-[r4:IS_RELATED_TO*] -(p2)-[r2:HAS_NAME_VARIANT {qualification: "preferred"}]->(n2:Name)
WHERE n.NameID = $neodash_name_nameid
RETURN *

that returns the following genealogical information:

It should be read as follows: Harry Peters was related to Maria Borrewater, who was related to Remy la Motte and Mathias van der Goes, etc.
The nodes marked "P..." are necessary in the query but I don't want/need them in the graph.
I therefore tried apoc.create.vRelationship:

, but that doesn't work:

(Harry Peters is related to Maria Borrewater (via Family) but not to the others, etc.
Is the path length indication * in the r4-relationship not compatible with apoc.create.vRelationship?
So my question is: how can I replace the whole MATCH-string with (n)-[vRelationship]-(n2) and still obtain the same names with the correct relationships as in the first picture?
Thank you for any suggestion!

jalakoo · April 24, 2023, 4:25pm

Could you repost your query with the apoc.create.vRelationship() function?

guido · April 24, 2023, 4:50pm

Sorry about that!

MATCH (n)<-[r:HAS_NAME_VARIANT {qualification: "preferred"}]-(p:Producer)-[r3:IS_RELATED_TO*]-(f:Family)-[r4:IS_RELATED_TO*] -(p2)-[r2:HAS_NAME_VARIANT {qualification: "preferred"}]->(n2:Name)
WHERE n.NameID = $neodash_name_nameid
RETURN DISTINCT n, n2, f, apoc.create.vRelationship(n, 'related to', {}, f), apoc.create.vRelationship(f, 'related to', {}, n2)

glilienfield · April 24, 2023, 8:05pm

It looks like you want the person to be related to its families (if more than one) and then the other people related to the corresponding family, i.e. remove all the 'Producer' nodes.

Try this:

MATCH (n)<-[r:HAS_NAME_VARIANT {qualification: "preferred"}]-(p:Producer)-[r3:IS_RELATED_TO*]-(f:Family)
WHERE n.NameID = $neodash_name_nameid
WITH n, f, apoc.create.vRelationship(n, 'related to', {}, f) as rel1
MATCH (f)-[r4:IS_RELATED_TO*] -(p2)-[r2:HAS_NAME_VARIANT {qualification: "preferred"}]->(n2:Name)
RETURN n, f, rel1, apoc.create.vRelationship(f, 'related to', {}, n2) as rel2, n2

guido · April 25, 2023, 8:32am

Hi Gary,
thanks for your reply, but that doesn't do the trick:

If you would care to work with the original data, this is my test file (remove the .txt):

Family.csv.txt (440 Bytes)

Thanks.

glilienfield · April 27, 2023, 2:09am

Can you diagram what you are looking for? My attempt was to eliminate the Producer nodes.

guido · April 27, 2023, 8:51am

By all means. That looks like the first graph, but without the P-nodes:

I have added the T-indications to the family node for your information; they correspond to the FamID field in the csv and the FamID-property of the Family-node. They should not appear in the graph.
The relationship labels correspond to the Type-field in the csv and are properties of the Family-node as well as of the IS_RELATED_TO-relationship.
Thanks for all your efforts.

glilienfield · May 6, 2023, 4:19am

Is this close?

Full Data Set:

Query:

match (n:Name)-[:HAS_NAME_VARIANT]-(p:Producer)-[:IS_RELATED_TO]-(f:Family)
return n, f, apoc.create.vRelationship(n,'IS_MEMBER_OF', {}, f) as rel

Result:

guido · May 6, 2023, 7:48am

Yes, comes close
I have added the restriction that only the preferred name variant is shown and labelled the relationships:

match (n:Name)-[:HAS_NAME_VARIANT{qualification: "preferred"}]-(p:Producer)-[r:IS_RELATED_TO]-(f:Family)
return n, f, apoc.create.vRelationship(n,CASE WHEN r.Type = "child" THEN 'CHILD' ELSE CASE WHEN r.Type = "spouse" THEN 'SPOUSE' ELSE" " END END, {}, f) as rel

and that works fine for this sample where only these 7 Producers are on file:

So we still need to add the restriction that only shows the relationships for the Producer in focus:

match (n:Name)-[:HAS_NAME_VARIANT{qualification: "preferred"}]-(p:Producer)-[r:IS_RELATED_TO]-(f:Family)
WHERE n.NameID = "N189"
return n, f, apoc.create.vRelationship(n,CASE WHEN r.Type = "child" THEN 'CHILD' ELSE CASE WHEN r.Type = "spouse" THEN 'SPOUSE' ELSE" " END END, {}, f) as rel

but then, of course, only the relationships of that specific Producer are returned:

which is why I added the outgoing relationship from f:Family in my original query:

MATCH (n)<-[r:HAS_NAME_VARIANT {qualification: "preferred"}]-(p:Producer)-[r3:IS_RELATED_TO*]-(f:Family)-[r4:IS_RELATED_TO*]-(p2)-[r2:HAS_NAME_VARIANT {qualification: "preferred"}]->(n2:Name)

which works fine when I RETURN *, but not when I want to eliminate the P-nodes with apoc.create.vRelationship

glilienfield · May 6, 2023, 4:58pm

I hope I got the relationship type correct for the case when the other node is multiple hops away from the common family node. I chose to use the type from the relationship directly connected to the common family node.

MATCH (n) WHERE n.NameID = "N189"
CALL {
    WITH n
    MATCH (n)-[:HAS_NAME_VARIANT {qualification: "preferred"}]-(:Producer)-[:IS_RELATED_TO*0..]-()-[r:IS_RELATED_TO]-(f:Family)
    RETURN n as name, f as family, r
    UNION
    WITH n
    MATCH (n)<-[:HAS_NAME_VARIANT {qualification: "preferred"}]-(:Producer)-[:IS_RELATED_TO*]-(f:Family)-[r:IS_RELATED_TO]-()-[:IS_RELATED_TO*0..]-(:Producer)-[:HAS_NAME_VARIANT {qualification: "preferred"}]->(o)
    RETURN o as name, f as family, r
}
RETURN name, family, apoc.create.vRelationship(name, CASE r.Type WHEN "child" THEN 'CHILD'  WHEN "spouse" THEN 'SPOUSE' ELSE "?" END, {}, family) as vRelationship

guido · May 7, 2023, 10:00am

That seems to work fine for Maria Borrewater (N189). I have marked the person in focus with a red node and shown the NameID instead of the Name:

When focussing on another person, we should get the same graph with another red node. But they all look different and weird, e.g. N101 (Hans de Grave), who is a child in only one family, is suddenly linked to all three families:

I tried to find the cause but I can't figure it out...

Btw, the reason why I have established the Family-nodes is shown in the correct graph: Maria Borrewater (N189) has been married twice and I have to distinguish between the children of the two marriages. There is no other way to do this than by including a "dummy node" for the families (which will also include other roles, like "best man", "maid", etc.)

glilienfield · May 7, 2023, 11:57am

Back to the drawing board. I will see if I can figure it out.

glilienfield · May 7, 2023, 9:50pm

The reason node N101 is related to multiple families is that the pattern allows traversing through a family node to additional family nodes, as long as the relationship types joining them is 'IS_RELATED_TO'. This happens because the relationships joining the Producer and Family nodes is 'IS_RELATED_TO'.

What should the result for N101 be? Is this the correct result for N101? Should the result find all the family nodes connected to the target node through only producer nodes, then all the other people connected to those family nodes via only producer nodes?

guido · May 8, 2023, 9:49am

Hi,
thanks for your tenacity!
Your question made me realise that the N189-Maria Borrewater graph only works because she is related to all Families (T1-3) in the sample.
Once I add Ts that are related to other Producers but not to the one in focus, the graph doesn't work anymore. Nor does the original which includes the P-nodes.
(My goal is to depict dynasties and networks: much like royalty and nobility at the time, tradesmen such as printers also created strong networks by arranging marriages for their daughters while they would set up businesses for their sons. My favourite example is the Verdussen dynasty that spanned the best of three centuries, consisted of 23 active printers and had matrimonial links with a similar number of other printers. Although the most extreme example, I still want to show the whole dynasty.
That means that the query should not only return the direct T-nodes the person-in-focus has a relationship with, but also the T-nodes (and attached N-nodes) of the people with a relationship to those.)
To expand the sample, I have added a few spouse- and child-nodes to N1-Matthias van der Goes through the new families F4, F5 and F6. So you might want to import this expanded family file:

Family.csv.txt (591 Bytes)

I adapted the query with the P-nodes as follows:

MATCH (n:Name)<-[r:HAS_NAME_VARIANT {qualification: "preferred"}]-(p)-[r2:IS_RELATED_TO]->(f:Family)<-[r3:IS_RELATED_TO]-(p4)-[r4:HAS_NAME_VARIANT {qualification: "preferred"}]->(n2:Name)
OPTIONAL MATCH (p4)<-[r5:IS_RELATED_TO*]-()
WHERE n.NameID = "N189"
RETURN *

and that perfectly returns the graph that I want:

and, no matter which NameID I select, the graph remains the same, as it should (only the red colour is on different nodes, of course).
The question remains: how to eliminate the P-nodes...?

glilienfield · May 11, 2023, 5:02pm

I did some research. That query is actually not doing what you think it is doing. I am a little confused myself. In summary, the 'where' clause is not applied. You can see this if you return n.NameID. The optional match is always null since the direction of the relationship is never met. I am getting the same results with just the first match:

MATCH (n:Name)<-[r:HAS_NAME_VARIANT {qualification: "preferred"}]-(p)-[r2:IS_RELATED_TO]->(f:Family)<-[r3:IS_RELATED_TO]-(p4)-[r4:HAS_NAME_VARIANT {qualification: "preferred"}]->(n2:Name)
RETURN *

Results with original query:

MATCH (n:Name) 
MATCH (n)<-[r:HAS_NAME_VARIANT {qualification: "preferred"}]-(p)-[r2:IS_RELATED_TO]->(f:Family)<-[r3:IS_RELATED_TO]-(p4)-[r4:HAS_NAME_VARIANT {qualification: "preferred"}]->(n2:Name)
optional match (p4)<-[:IS_RELATED_TO*]-(x)
with *
WHERE n.NameID = "N189"
return *

If you insert a 'with' clause before the 'where' clause, then the constraint is imposed. I think this may be a defect. Looking at the explain plan, you see that the optional match is performed separately from the first match and the constrain is applied to the optional match only. This is weird since the variable 'n' would not be in scope for the optional match.

The explain plan is entirely different if optional match is replaced with match. Then two matches are done together and the 'where' clause is applied to the results. This is what I would expect in both cases.

MATCH (n:Name) 
MATCH (n)<-[r:HAS_NAME_VARIANT {qualification: "preferred"}]-(p)-[r2:IS_RELATED_TO]->(f:Family)<-[r3:IS_RELATED_TO]-(p4)-[r4:HAS_NAME_VARIANT {qualification: "preferred"}]->(n2:Name)
optional match (p4)<-[:IS_RELATED_TO*]-(x)
with *
WHERE n.NameID = "N189"
return *

guido · May 11, 2023, 5:44pm

I added two more nodes to the sample, related to each other but not to N189:

Family.csv.txt (643 Bytes)

If the WHERE-restriction works properly, they should not appear. And they don't, not with your last query nor with my original one.
But when I remove WHERE:

MATCH (n:Name) 
//WHERE n.NameID = $neodash_name_nameid
MATCH (n)<-[r:HAS_NAME_VARIANT {qualification: "preferred"}]-(p)-[r2:IS_RELATED_TO]->(f:Family)<-[r3:IS_RELATED_TO]-(p4)-[r4:HAS_NAME_VARIANT {qualification: "preferred"}]->(n2:Name)
optional match (p4)<-[:IS_RELATED_TO*]-(x)
with *
//WHERE n.NameID = "N189"
return *

they do appear:

so IMHO the restriction does work... or am I missing something (as usual )?

glilienfield · May 11, 2023, 6:51pm

It was not working with this one. I will check with again with your updated family data.

Btw- I have switched to using v5

MATCH (n:Name)<-[r:HAS_NAME_VARIANT {qualification: "preferred"}]-(p)-[r2:IS_RELATED_TO]->(f:Family)<-[r3:IS_RELATED_TO]-(p4)-[r4:HAS_NAME_VARIANT {qualification: "preferred"}]->(n2:Name)
OPTIONAL MATCH (p4)<-[r5:IS_RELATED_TO*]-()
WHERE n.NameID = "N189"
RETURN *

glilienfield · May 11, 2023, 11:27pm

The 'where' clause does not work when placed after the optional match.

It does work when placed with the match for the Name node.

As you can see, with the restriction on the Name node, you lose the 'long tail' that originates from the P1 Producer node, as these nodes are not within the one family node in the pattern. As you can see in the table data above, the id(x) is always null. This shows the optional match is always null. This is because the relationship direction is in the opposite direction when traversing away from Producer P1 (representing p4 in the query).

I know how to get the virtual graph for the nodes resulting from the 'match'. The question is how to get the tail and then replace the Producer nodes with virtual connections between the common Person and Family nodes.

guido · May 15, 2023, 2:25pm

After a lot of blind testing (trying alternatives without giving much thought about why ever they would make sense) I came up with the following query:

MATCH (n:Name) 
WHERE n.NameID  = "N111"
//= $neodash_name_nameid
MATCH (n)<-[r:HAS_NAME_VARIANT {qualification: "preferred"}]-(p)-[r2:IS_RELATED_TO]->(f:Family)<-[r3:IS_RELATED_TO]-(p4)-[r4:HAS_NAME_VARIANT {qualification: "preferred"}]->(n2:Name)
OPTIONAL MATCH (p4)-[r5:IS_RELATED_TO*]-(p5)-[r6:HAS_NAME_VARIANT {qualification: "preferred"}]->(n3:Name)
OPTIONAL MATCH (p5)-[r7:IS_RELATED_TO*]->(p6)
RETURN *

It produces exactly the graph it should generate: all nodes and relationships while ignoring the ones that are not linked to f:Family (you can test this by removing the WHERE).

I have no idea why the second optional match should be there but somehow it has to. No idea either why there are no P-nodes without an n.Name attached to them (since the second optional match does not link to a Name, but things fall apart when I add that relationship).
So I added yet another Family (T8) to make sure the * works and it does. Result:

New file:
Family.csv.txt (718 Bytes)

Does this help?

glilienfield · May 19, 2023, 6:25pm

How is this? Yellow is Name nodes and Blue are Family nodes. The search node (N111) is on the middle-bottom. I basically found all the Producer nodes along the collection of paths and found each Producer's corresponding Family and Name nodes, which were also nodes on the paths. I then created a virtual relationship between each Family/Name pair.

Interesting, I did not need to explicitly handle nodes n and f, as they were included in the collection of paths 'p1'. This is because the 'p4' nodes resulting from the second match, also traced back to the 'n' and 'f' nodes in the third match, along with creating paths to all the other nodes extending from the other names directly related to the anchor Name node 'n.

MATCH (n:Name) 
WHERE n.NameID  = "N111"
MATCH (n)<-[:HAS_NAME_VARIANT {qualification: "preferred"}]-(p)-[:IS_RELATED_TO]->(f:Family)
MATCH (f)<-[:IS_RELATED_TO]-(p4)
MATCH p1=(p4)-[:IS_RELATED_TO*]-()-[:HAS_NAME_VARIANT {qualification: "preferred"}]->(:Name)
WITH apoc.coll.toSet(reduce(s=[], path in collect(p1) | s+nodes(path))) as all_nodes
WITH all_nodes, [p in all_nodes where 'Producer' in labels(p)] as producers
UNWIND producers as producer
MATCH (n:Family)<-[r:IS_RELATED_TO]-(producer)-[:HAS_NAME_VARIANT {qualification: "preferred"}]->(m:Name)
WHERE n in all_nodes and m in all_nodes
RETURN n, m, apoc.create.vRelationship(n, r.Type, {}, m) as vrel

Topic		Replies	Views
Path expand and Virtual relationships Procedures & APOC apoc , path	1	593	October 28, 2020
Using APOC to match variable length of relationships in between Procedures & APOC apoc , cypher , operations	10	496	March 7, 2022
Match Variable length paths with different relationship types Cypher cypher	2	2604	December 11, 2018
Cypher query for a complete path between several related nodes? Cypher	10	2327	June 9, 2023
Match on virtual APOC vNode/vRelationship Neo4j Graph Platform cypher	1	232	May 2, 2023

Get Certified in June!

apoc.create.vRelationship and long paths

Related topics