Concurrency while developing algorithms with Graph Data Science API

cuneyttyler · June 2, 2022, 11:29am

Hi,

I'm trying to develop a multi-threaded version of Dijkstra Shortest-Path Algorithm for multiple source-target pairs using GDS Api. I used ParallelUtil class to run tasks in parallel. It seems to run correctly on tests. However, in my real graph which has about 1 million records, it runs very slowly. If I run the algorithm without multithreading with 5 source-target pairs it finishes in a reasonable time. With multi-threading it throws Transaction Closed error.

What I do in my code is simply to create a task for each pair and in each task, implement the Dijkstra's algorithm. So in each task, a copy of the graph is stored as well as huge long queues for algorithm's calculations. Maybe these cause a memory overhead and that's because the algorithm runs slower(even never finishes) than non-parallel version.

Do you have any ideas? Thanks.

cuneyttyler · June 2, 2022, 7:45pm

Here is my code : https://github.com/cuneyttyler/semanticspace-neo4j-gds/blob/master/algorithms/src/main/java/com/semanticspace/shortestpath/DijkstraMultiplePairs.java

I simply followed the procedure in BetwennessCentrality.java to do parallelism.

EDIT: So I realised a few things and corrected my code. However, now a weird thing happens. It seems to work in normal speed but never finishes calculating because it never reaches the target node even if it should have. The weird thing is that, when I put a log line in line 220 with '% 100' to log in every 100 counts of processed relations, it converges in the right time - it manages to process the target node so it finishes. But when I increase this log condition to '% 10000', it somehow skips the target node and cannot find a path. So it's clearly related to multi-threading. That's why I put a syncronized block on queue, though it didn't work. Also I'm trying with one source-target pair so only one additional thread is running. I'm really confused here. What can cause the algorithm to skip some relations when working too fast(not printing anything that causes it to slow down)?

michael.hunger · June 2, 2022, 7:00pm

Can you share your actual code? That would probably help a lot?

cuneyttyler · June 3, 2022, 11:11am

I found the issue : I defined local concurrent copy of graph as type 'Graph', although it wasn't like that in the other GDS core methods. I wanted to try something, but when I changed it to RelationshipIterator = graph.concurrentCopy(), now it works as desired and now we have a nice concurrent multiple pairs Dijkstra procedure.

You can access it here : https://github.com/cuneyttyler/semanticspace-neo4j-gds/

Procedure

semanticspace.gds.dijkstraMultiplePairs.stream - parameters sourceNodes(long list), targetNodes(long list)

I wonder why using Graph instance creates such an issue though.

cuneyttyler · June 3, 2022, 8:59pm

I solved the issue, here is the answer https://community.neo4j.com/t5/neo4j-graph-platform/algorithm-developed-using-gds-core-api-gives-different-results/td-p/56276#M33553

Topic		Replies	Views
Algorithm developed using GDS core api gives different results for different graph projections Neo4j Graph Platform migrated	1	153	June 3, 2022
Additional Algorithms for Graph Data Science Neo4j Graph Platform migrated	2	138	June 4, 2022
Neo4j GDS algorithms execution time Neo4j Graph Platform migrated	0	152	November 2, 2022
Neo4j GDS algorithms execution time Neo4j Graph Platform migrated	1	169	November 5, 2022
GDS Concurrency parameter doesn't impact performance Graph Algorithms/Graph Data Science performance	2	271	April 18, 2021

Get Certified in June!

Concurrency while developing algorithms with Graph Data Science API

Related topics