Graph Data Science Algorithms Running Very Slow and Very Heavy on Memory

omri · July 12, 2020, 3:00pm

Hello!

I am using the Graph Data Science library to run graph algorithms. My current goal is to find travel bands / travel sheds in a transit network graph. That is, I want to retrieve all the nodes accessible within a time limit, which is expressed in the relationships costs. I am trying to use DFS for this tasks (the code will follow.)

My testing graph currently consists of ~1,600 nodes and ~4,100 relationships. This will increase substantially later.
In the beginning it would always crash due to Java Heap overflow, following a quick spike in RAM use, so I played with the neo4j.conf parameters. I set:
dbms.memory.heap.initial_size=8g
dbms.memory.heap.max_size=8g
dbms.memory.pagecache.size=8g
dbms.tx_state.memory_allocation=OFF_HEAP

Prior to that the RAM usage would skyrocket very quickly, and now the algorithm never finishes. I have never seen databases use up so much RAM for seemingly simple tasks. I am used to databases utilizing static memory better. Can anyone recommend better configurations and/or explain to me how to get neo4j to use static memory better without creating Java Heap overflows?

Anyhow this is my procedure:

Constrains and Indices:

CREATE CONSTRAINT ON (n:Node) ASSERT n.id is unique;
CREATE INDEX ON :Node(stop_code);

Create the graph:
CALL gds.graph.create('MT', 'Node', 'TRAVEL', { relationshipProperties: 'time'})

Run DFS:

MATCH (a:Node{stop_code: 400550})
WITH id(a) AS startNode
CALL gds.alpha.dfs.stream('MT', {startNode: startNode, maxCost: 3600, relationshipWeightProperty: 'time'})
YIELD nodeIds
RETURN nodeIds

Any help would be appreciated! Also, if I could write my own algorithm in Cypher than would be great. There are more considerations and costs I would like to add. I could not figure out how to do that.

Thanks a lot!

alicia.frame · July 12, 2020, 5:06pm

Hi @omri - the first step I would take is to check out how much memory your in-memory graph is taking up. After you've created you graph you can use CALL gds.graph.list() and see how much RAM the graph is consuming. We don't currently support .estimate for DFS, since it's in the alpha space, but it's also likely to be fairly memory intensive on a densely connected graph (which I suspect yours is, given the ratio of nodes to edges). You could also add a maxDepth parameter to try to bound the calculations as well.

If you're keen to write your own algorithms, I'd take a look at our pregel API, which lets you easily and quickly write parallelized algorithms against the GDS infrastructure.

omri · September 25, 2020, 1:12pm

Thanks, Alicia. I'll update how it goes.

cuneyttyler · May 28, 2022, 4:52am

Hi,

How did it go? I have a similar problem, although I have a very large graph. But I'm using the maxDepth parameter as 1 so I expect to run my code within BFS limits which return results instantly. However, this is not the case. DFS runs forever.

Topic		Replies	Views
Neo4j Graph Algorithms Release — Memory Requirements, Concurrency Settings, Bug Fixes Neo4j Developer Blog Archive	0	853	July 18, 2019
What library to use instead GDS when graph db is too big to project in memory? Graph Algorithms/Graph Data Science	3	393	May 5, 2022
Memory Requirement? Operations	6	734	December 3, 2019
Memory requirements seems to be dependent of the number of nodes Neo4j Graph Platform migrated	5	234	September 12, 2022
Neo4j memory requirement for running gds algorithm Cypher apoc , graph	2	405	February 5, 2021

Get Certified in June!

Graph Data Science Algorithms Running Very Slow and Very Heavy on Memory

Related topics