Hi, great minds! I am new to neo4j and currently exploring an existing graph to extract data for downstream tasks.
I would like to get all pairs of nodes and their relationship from the graph.
MATCH (n)-[r]-(n1) WHERE n<>n1 AND n1>n RETURN *
This will return about 12,726,288 estimated rows.
Instead, I decided to extract the pairwise information between 2 node types
MATCH (n:Node{type:nodetypeA})-[r]-(n1: Node{type:nodetypeB}) WHERE n<>n1 AND id(n)<id(n1) RETURN *
with 653,022 estimated rows; sadly, neo4j has timeout continuously. I have increased the connection timeout (ms) through the neo4j browser, yet nothing works differently.
Why do you think this may be a timeout problem? This may be a Desktop OOM rendering problem. You may be asking for too much info to be displayed. Have you try with a driver of your preference? My personally, I have used SDN6 with Webflux without problems.
I agree; probably the 653,022 rows are too much to extract.
Thanks a lot for your suggestion about SDN6 and Webflux, It's my first time learning a bit about reactive programming. However, it appears the reactive clients provide no support for Python, and I currently run my queries in Python Environment and connect to neo4j with py2neo.
Any further suggestions will be greatly appreciated.
from neo4j import GraphDatabase
user = "youUsername"
password = "yourPassword"
uri = "yourUri"
driver = GraphDatabase.driver(uri, auth=(user,password))
with driver.session() as session:
result = session.run("MATCH (n:Node{type:nodetypeA})-[r]-(n1: Node{type:nodetypeB}) WHERE n<>n1 AND id(n)<id(n1) RETURN n as node1, n1 as node2, r as rel")
for record in result:
print("node1 {}".format(record["node1"]))
I noticed that the query gives only the first-order connection between 2 nodes; however, I will need at least the second-order relationship for my downstream application. I have tried:
MATCH (n)-[r*1..2]-(n1)
WHERE n<>n1 AND id(n)<id(n1)
WITH n.name as Name1, n1.name as Name2, r AS rel
UNWIND rel AS rl
RETURN Name1, Name2, Id, rl.id AS relId
which estimated about 2 million rows.
I would like to skip some rows so I can eventually end up with less than 1 million (~500,000). I have played with a few other queries like SKIP and LIMIT, but I can't seem to get a helpful result.
This is technically another question, but let's do it
In general, I don't agree with this whole db export stream but if it works for you. It's fine. Can you try a query like?
MATCH p = (n)-[*1..2]-(n1)
WHERE id(n)<id(n1)
WITH n.name as Name1, n1.name as Name2, relationships(p) as rel
SKIP 10
LIMIT 10
UNWIND rel AS rl
RETURN Name1, Name2, rl.id AS relId
Keep in mind that limit and skip will apply on the *WITH* step.