Graph Data Science Algorithms Running Very Slow and Very Heavy on Memory

Hello!

I am using the Graph Data Science library to run graph algorithms. My current goal is to find travel bands / travel sheds in a transit network graph. That is, I want to retrieve all the nodes accessible within a time limit, which is expressed in the relationships costs. I am trying to use DFS for this tasks (the code will follow.)

My testing graph currently consists of ~1,600 nodes and ~4,100 relationships. This will increase substantially later.
In the beginning it would always crash due to Java Heap overflow, following a quick spike in RAM use, so I played with the neo4j.conf parameters. I set:
dbms.memory.heap.initial_size=8g
dbms.memory.heap.max_size=8g
dbms.memory.pagecache.size=8g
dbms.tx_state.memory_allocation=OFF_HEAP

Prior to that the RAM usage would skyrocket very quickly, and now the algorithm never finishes. I have never seen databases use up so much RAM for seemingly simple tasks. I am used to databases utilizing static memory better. Can anyone recommend better configurations and/or explain to me how to get neo4j to use static memory better without creating Java Heap overflows?

Anyhow this is my procedure:

Constrains and Indices:

CREATE CONSTRAINT ON (n:Node) ASSERT n.id is unique;
CREATE INDEX ON :Node(stop_code);

Create the graph:
CALL gds.graph.create('MT', 'Node', 'TRAVEL', { relationshipProperties: 'time'})

Run DFS:

MATCH (a:Node{stop_code: 400550})
WITH id(a) AS startNode
CALL gds.alpha.dfs.stream('MT', {startNode: startNode, maxCost: 3600, relationshipWeightProperty: 'time'})
YIELD nodeIds
RETURN nodeIds

Any help would be appreciated! Also, if I could write my own algorithm in Cypher than would be great. There are more considerations and costs I would like to add. I could not figure out how to do that.

Thanks a lot!

Hi @omri - the first step I would take is to check out how much memory your in-memory graph is taking up. After you've created you graph you can use CALL gds.graph.list() and see how much RAM the graph is consuming. We don't currently support .estimate for DFS, since it's in the alpha space, but it's also likely to be fairly memory intensive on a densely connected graph (which I suspect yours is, given the ratio of nodes to edges). You could also add a maxDepth parameter to try to bound the calculations as well.

If you're keen to write your own algorithms, I'd take a look at our pregel API, which lets you easily and quickly write parallelized algorithms against the GDS infrastructure.

Thanks, Alicia. I'll update how it goes.

Hi,

How did it go? I have a similar problem, although I have a very large graph. But I'm using the maxDepth parameter as 1 so I expect to run my code within BFS limits which return results instantly. However, this is not the case. DFS runs forever.