Hello!
I am using the Graph Data Science library to run graph algorithms. My current goal is to find travel bands / travel sheds in a transit network graph. That is, I want to retrieve all the nodes accessible within a time limit, which is expressed in the relationships costs. I am trying to use DFS for this tasks (the code will follow.)
My testing graph currently consists of ~1,600 nodes and ~4,100 relationships. This will increase substantially later.
In the beginning it would always crash due to Java Heap overflow, following a quick spike in RAM use, so I played with the neo4j.conf parameters. I set:
dbms.memory.heap.initial_size=8g
dbms.memory.heap.max_size=8g
dbms.memory.pagecache.size=8g
dbms.tx_state.memory_allocation=OFF_HEAP
Prior to that the RAM usage would skyrocket very quickly, and now the algorithm never finishes. I have never seen databases use up so much RAM for seemingly simple tasks. I am used to databases utilizing static memory better. Can anyone recommend better configurations and/or explain to me how to get neo4j to use static memory better without creating Java Heap overflows?
Anyhow this is my procedure:
Constrains and Indices:
CREATE CONSTRAINT ON (n:Node) ASSERT n.id is unique;
CREATE INDEX ON :Node(stop_code);
Create the graph:
CALL gds.graph.create('MT', 'Node', 'TRAVEL', { relationshipProperties: 'time'})
Run DFS:
MATCH (a:Node{stop_code: 400550})
WITH id(a) AS startNode
CALL gds.alpha.dfs.stream('MT', {startNode: startNode, maxCost: 3600, relationshipWeightProperty: 'time'})
YIELD nodeIds
RETURN nodeIds
Any help would be appreciated! Also, if I could write my own algorithm in Cypher than would be great. There are more considerations and costs I would like to add. I could not figure out how to do that.
Thanks a lot!