GDS RandomWalk Performance Optimization

Hi there!

Running GDS 1.2.1 on Neo4j 4.3.0 in GCP (on the Bitnami image). The machine has 4 vCPUs and 26Gb of RAM, with default neo4j config (ulimit is 40k, heapinitial, heapmax, and pagesize are all defaults).

However, when running the following query on a graphDB with 17M nodes and 215M relationships, it takes upwards of 50s to complete (and slams all cores at 100%):

MATCH (home:Page {wikiid: "1967"})
CALL gds.alpha.randomWalk.stream({nodeProjection: '*', relationshipProjection: {Link: { type: 'Link', orientation: 'NATURAL'}}, start: id(home), steps: 6, walks: 4})
YIELD nodeIds
RETURN nodeIds

Is this to be expected? Are there things I can do to optimize this? This seems worse than just writing out some code myself, right? It's just selecting a random outbound relationship recursively?

ps I have an index on wikiid

Hello,

Did you acheive your data sampling with this large data graph ?
I am looking for an example of a large data graph with its sampled version (wathever the data sampling algorithm tha has been used). I will be gratefull if you could give me these ressources.

Thanks a lot