Path misses relationship

I have a database with approx 50k nodes. Then I import a test CSV file with a header and 7 data lines:

"Father","Mother"
"I000000","I000003"
"I000003","I000013"
"I000013","I000011"
"I000011","I000243"
"I000243","I000084"
"I000000","I000089"
"I000011","I000088"

with

LOAD CSV WITH HEADERS FROM 'file:///C:/Temp/Families.csv' AS line
MATCH (p1:Person), (p2:Person)
WHERE p1.GID = line.Father AND p2.GID = line.Mother
MERGE (p1)-[r:Marriage]-(p2);

This should create 7 bidirectional relationships which is exactly what it does. The resulting network is displayed with

MATCH (p1:Person)-[r]-(p2:Person)
RETURN r, p1, p2;


Everything looks perfect.

Then I run

MATCH path = (p1:Person)-[:Marriage*6]->(p2:Person)
RETURN path, p1, p2;

and get nothing back:

(no changes, no records)

Since I have bidirectional relationships, I feel the path between the nodes I000089 and I000084 should be returned since this path has exactly 6 edges.

So I run

MATCH path = (p1:Person)-[:Marriage*5]->(p2:Person)
RETURN path, p1, p2;

and get back the path between nodes I000000 and I000084. I would expect that the paths between nodes I000089 and I000088 resp I000089 and I000243 should also be returned since those two paths have also five edges.

I now run

MATCH path = (p1:Person)-[:Marriage*4]->(p2:Person)
RETURN path, p1, p2;

and get back the network as shown above with the exception of node I000089.

So it appears as if node I000089 is being ignored whenever I'm looking for a path though there is an edge between nodes I000089 and I000000. What's happening here?

Thanks for your comments!

Ulrich

This is because the syntax ()-[*6]->() asked to match the pattern with exactly 6 hops. Your longest path is 5 hops, that is why it returned results when you decreased it to 5 and less.

I think you want any path from 1 to 6 hops. This syntax is ()-[*1..6]->() or ()-[*..6]->() (shorthand).

MATCH path = (p1:Person)-[:Marriage*..6]->(p2:Person)
RETURN path, p1, p2;

Note: Neo4j does not have bidirectional relationships. In the merge statement where you did not specify a direction, Neo4j will try to match in both directions. If neither direction is found and the relationships needs to be created, Neo4j will create the relationship from left-to-right.

Ok, this explains the results I get. So if I really want bidirectional relationships as I need in my model, I would have to import my edges two times:

LOAD CSV WITH HEADERS FROM 'file:///C:/Temp/Families.csv' AS line
MATCH (p1:Person), (p2:Person)
WHERE p1.GID = line.Father AND p2.GID = line.Mother
MERGE (p1)-[r:Marriage]->(p2);

LOAD CSV WITH HEADERS FROM 'file:///C:/Temp/Families.csv' AS line
MATCH (p1:Person), (p2:Person)
WHERE p1.GID = line.Father AND p2.GID = line.Mother
MERGE (p2)-[r:Marriage]->(p1);

Then I really get all nodes with relationships when saying:

MATCH path = (p1:Person)-[:Marriage*6]->(p2:Person)
RETURN path, p1, p2;

Being interested in the longest path I say:

MATCH path = (p1:Person)-[:Marriage*]->(p2:Person)
WITH path, length(path) AS chain_length
ORDER BY chain_length DESC
LIMIT 1
RETURN chain_length;

The result is 14 since the chain is "traversed" in both directions. Is there a better approach?

Thanks for your support!

You can do it in one import

LOAD CSV WITH HEADERS FROM 'file:///C:/Temp/Families.csv' AS line
MATCH (p1:Person), (p2:Person)
WHERE p1.GID = line.Father AND p2.GID = line.Mother
MERGE (p1)-[r:Marriage]->(p2)
MERGE (p2)-[r:Marriage]->(p1)

I don’t see the point in have a direction in both direction unless you had relationship variables that where different in each direction.

Since the direction seems irrelevant to you data model, you can choose either direction when creating them and use patterns that don’t specify a direction. You need to be careful here, as you will get paths in both directions.

MATCH path = (p1:Person)-[:Marriage*6]-(p2:Person)
RETURN path, p1, p2;

I guess you have a typo in your code. The correct version would be

LOAD CSV WITH HEADERS FROM 'file:///C:/Temp/Families.csv' AS line
MATCH (p1:Person), (p2:Person)
WHERE p1.GID = line.Father AND p2.GID = line.Mother
MERGE (p1)-[r1:Marriage]->(p2)
MERGE (p2)-[r2:Marriage]->(p1)

or an error message pops up saying that r has already been defined. Obviously there must be two different "r's" in the MERGE statements.

As I mentioned I'm interested in the longest path. The problem with specifying no direction appears to be speed. My version (pattern with direction) would be

MATCH path = (:Person)-[:Marriage*]->(:Person)
WITH path, length(path) AS chain_length
ORDER BY chain_length DESC
LIMIT 1
RETURN path;

This returns a result (13 nodes, 24 relationships in my full model with 50k nodes and 57k edges, i e 12 non-directional edges between the 13 nodes) in less than a second which is really impressive. Specifying no direction in

MATCH path = (:Person)-[:Marriage*]-(:Person)
WITH path, length(path) AS chain_length
ORDER BY chain_length DESC
LIMIT 1
RETURN path;

still runs :wink:

You are correct with the code. I sloppily copied your code and didn’t remove the variable ‘r’’, so it would not be duplicated. Your approach of defining different variable names works too, but the variable is not needed since it is not referenced anywhere.

The none directional one is going to be problematic on a decent side dataset, as all paths and segments are going to match. In each direction too.

Ah, of course, thanks for pointing this out.

Agreed -- it is still running :wink:

Thanks a lot Gary, I learned very much from your comments!

1 Like