Hi ! I'm currently working on a project using Python and Neo4J 3.5. I'm dealing with voluminous data, and my goal to find links between "important" nodes. I wrote the following query in order to find the shortest path between every pair of "important" nodes (considering only path of length1 or 2) :
MATCH path = shortestPath( (n1)-[*..2]-(n2) )
WHERE n1:IMPORTANT and n2:IMPORTANT and id(n1)>id(n2)
RETURN path
To get the additional links between intermediates nodes, this result is completed with a second query :
MATCH ()-[r]-()
RETURN r
The result of the second query was filtered (via python) to only keep the relationships between the nodes obtained through the first query.
I'm trying to improve the code, so that the result can be obtained through one query. I write the following query :
MATCH path = shortestPath( (s1)-[*..2]-(s2) )
WHERE s1:IMPORTANT and s2:IMPORTANT and id(s1)>id(s2)
WITH nodes(path) as nodeslist
MATCH p = (m)-[r]-(n)
WHERE m in nodeslist AND n in nodeslist AND id(m)>id(n)
RETURN p
However this doesn't seem to work : the links between intermediate (non important) nodes are not returned.
Here is a little set up to reproduce the error:
MERGE (n1:IMPORTANT {name:'Emma'})
MERGE (n2:IMPORTANT {name:'David'})
MERGE (n3:IMPORTANT {name:'Peter'})
MERGE (n4:NEUTRAL {name:'Paul'})
MERGE (n5:IMPORTANT {name:'Mary'})
MERGE (n6:NEUTRAL {name:'Jane'})
MERGE (n7:NEUTRAL {name:'John'})
MERGE (n1) - [r1:KNOWS] - (n2)
MERGE (n2) - [r2:KNOWS] - (n4)
MERGE (n2) - [r3:KNOWS] - (n6)
MERGE (n4) - [r4:KNOWS] - (n3)
MERGE (n4) - [r5:KNOWS] - (n6)
MERGE (n5) - [r6:KNOWS] - (n6)
MERGE (n7) - [r7:KNOWS] - (n1)
The full graph (edges are not oriented)
complete graph
Expected result:
expected result
Python code:
driver = GraphDatabase.driver(uri, auth=(username, password))
session= driver.session()
query = """MATCH path = shortestPath( (s1)-[*..2]-(s2) )
WHERE s1:IMPORTANT and s2:IMPORTANT and id(s1)>id(s2)
WITH nodes(path) as nodeslist
MATCH p = (m)-[r]-(n)
WHERE m in nodeslist AND n in nodeslist AND id(m)>id(n)
RETURN p"""
graph = session.run(query).graph()
print([n["name"] for n in graph.nodes])
print(["-".join([n["name"] for n in r.nodes]) for r in graph.relationships])
Result : 5 nodes, and 5 relationships (instead of 6 relationships). The edge between Paul and Jane is lacking.
['David', 'Emma', 'Paul', 'Peter', 'Jane', 'Mary']
['Emma-David', 'Paul-Peter', 'David-Paul', 'David-Jane', 'Mary-Jane']
Is my query misleading ? Neo4j version is 3.5, neo4j python lib is 4.4.4.