GDS RandomWalk Performance Optimization

ian.seyer · May 10, 2020, 11:47pm

Hi there!

Running GDS 1.2.1 on Neo4j 4.3.0 in GCP (on the Bitnami image). The machine has 4 vCPUs and 26Gb of RAM, with default neo4j config (ulimit is 40k, heapinitial, heapmax, and pagesize are all defaults).

However, when running the following query on a graphDB with 17M nodes and 215M relationships, it takes upwards of 50s to complete (and slams all cores at 100%):

MATCH (home:Page {wikiid: "1967"})
CALL gds.alpha.randomWalk.stream({nodeProjection: '*', relationshipProjection: {Link: { type: 'Link', orientation: 'NATURAL'}}, start: id(home), steps: 6, walks: 4})
YIELD nodeIds
RETURN nodeIds

Is this to be expected? Are there things I can do to optimize this? This seems worse than just writing out some code myself, right? It's just selecting a random outbound relationship recursively?

ps I have an index on wikiid

hm873154 · April 3, 2024, 3:59pm

Hello,

Did you acheive your data sampling with this large data graph ?
I am looking for an example of a large data graph with its sampled version (wathever the data sampling algorithm tha has been used). I will be gratefull if you could give me these ressources.

Thanks a lot

Topic		Replies	Views
It takes forever in executing the randomWalk algorithm Graph Algorithms/Graph Data Science	6	479	September 23, 2021
Optimizing Graph Database Performance on High-Performance PC Desktops Desktop	2	238	December 3, 2024
All pair shortest algorithm is taking very long time on very large data. Is there any way to optimise Cypher	0	255	February 6, 2021
Slow Cypher Query for combination GDS calls Graph Algorithms/Graph Data Science	2	518	October 21, 2020
Concurrency in random walk Neo4j Graph Platform migrated	3	142	August 4, 2022

Get Certified in June!

GDS RandomWalk Performance Optimization

Related topics