Hi all,
I wanted to understand why ANY and apoc’s subGraph queries are faster than typical Neo4J query. I’ve a graph with some 200M nodes with few large node clusters (connected nodes). These large clusters have up-to ~5k nodes.
When I’m querying for an example large cluster, say cluster A with 5000 nodes and a “depth” spread till a depth of 5 from the queried node using this the below query, Neo4J is constantly timing out (more than 5 minutes)
MATCH (n: User {user_id: 'text-id'})-[r*0..5]-(m) RETURN n, r, m
I checked the docs and find out that I can possibly use the shortest path technique for my use case using the ANY keyword as docs claimed that it increases performance(which it did). I modified my query but noticed significant performance gains. It returned my graph with 5000 nodes in around 10 seconds
MATCH (n: User {user_id: 'text-id'})-[r*0..5]-(m) RETURN n, r, m
Next, I tried an APOC based approach and it gave the same results with a couple of seconds
MATCH (n: User {user_id: ‘text-id’}
CALL apoc.path.subgraphAll(n, maxLevel: 5, relationshipFilter:'>', bfs:true}) YIELD nodes as nodes_, relationships AS rels_
RETURN nodes_ AS nodes, rels AS relationsips
I got this APOC query from ChatGPT and wanted to understand how the APOC based query is so much faster than my original query method. Is it just because of the ‘>’ direction filter? I tried querying in a directional way in my original query too but even then it timedout? ALso why is query with ANY so fast?
All the nodes in my graph are bidirectionally connected i.e. if a node A has outgoing edge E1 to node B, then a corresponding edge E2 from B to A is also present.
Thanks in advance !