It takes forever in executing the randomWalk algorithm

alex_ough
Node

Hi,
I'm currently trying to play with the 'randomWalk' algorithm, but it looks like it takes forever.
The named graph I created has 108 nodes and 584 relationships and this is the configuration I'm using.

CALL gds.alpha.randomWalk.stream('gds-events-graph', {
start: id(node),
steps: 5,
walks: 5
})

Currently, I'm running that cypher in a Desktop database (4.2.1) with
dbms.memory.heap.initial_size=1G
dbms.memory.heap.max_size=4G

Can anyone help me finding why it is so slow and how to speed up?

Thanks
Alex Ough

6 REPLIES 6

Is it slow, or does it never finish? If it's just slow, can you share how long it's taking?

Can you:

  • Run gds.graph.list and share the statistics from gds-events-graph (how many nodes, how many relationships), and
  • Run gds.debug.sysInfo and post what it says?

The additional information will make it much easier to pinpoint the problem!

alex_ough
Node

It never completes even 30 mins after.
I think I mentioned the numbers of nodes and relationships, but these are what you asked

degreeDistribution
{
"p99": 14,
"min": 1,
"max": 17,
"mean": 5.407407407407407,
"p90": 11,
"p50": 4,
"p999": 17,
"p95": 11,
"p75": 8
}
nodeCount: 108
relationshipCount: 584
density: 0.05053651782623745
schema
{
"relationships": {
"ALL": {
"weight": "Float"
}
},
"nodes": {
"GDSEvent": {
}
}
}

gds_sysinfo.json.txt (5.9 KB)

alex_ough
Node

Btw, is there anyway to see some kind of progress bar?

Thanks
Alex Ough

There is progress logging in GDS! In 1.6 you need to enable it, but it's on by default in GDS 1.7 (and greatly improved).

My first recommendation - since everything else seems ok - would be to upgrade your library to GDS 1.6.5 (current GA release) or the preview of GDS 1.7.0 and try again. Also, you may want to check you neo4j debug logs to make sure there aren't any error messages reported there.

alex_ough
Node

I upgraded the library to 1.6.5 and tried, but 'listProgress' returns nothing as attached.

And I see this message in the log file.

2021-09-23 01:19:27.537+0000 INFO RandomWalkProc: overall memory usage 0 Bytes
Exception in thread "Thread-20" java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 108
at org.neo4j.graphalgo.core.utils.paged.HugeIntArray$SingleHugeIntArray.get(HugeIntArray.java:280)
at org.neo4j.graphalgo.core.huge.TransientAdjacencyList.degree(TransientAdjacencyList.java:154)
at org.neo4j.graphalgo.core.huge.HugeGraph.degree(HugeGraph.java:316)
at org.neo4j.graphalgo.impl.walking.RandomWalk$RandomNextNodeStrategy.getNextNode(RandomWalk.java:163)
at org.neo4j.graphalgo.impl.walking.RandomWalk.doWalk(RandomWalk.java:117)
at org.neo4j.graphalgo.impl.walking.RandomWalk.lambda$compute$0(RandomWalk.java:90)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
at org.neo4j.internal.helpers.NamedThreadFactory$2.run(NamedThreadFactory.java:110)

Any thought?
Thanks

It looks like the problem is that you're getting an AIOBE and the error isn't carrying over into browser. Can you open an issue on our GitHub and attach what you've posted here? Then we can look at fixing it