Hi, I am trying to test almost all GDS to select the best algorithm for every about 50 various target tasks of a customer like finding some activity patterns on internet.
Some tasks can be accomplished by the several algorithms separately with the different result or time to perform it.
Is there any comparison or report like that already tested on the same data with the different GDS algorithm to compare its quality and performance? Those kind of information would be very helpful even though I must test almost all GDS anyway to build application for each task.
For example I found NodeSimilarity makes very good result quickly to compare thousands set of some news content but it cannot be used to compare hundreds thousands of sentences since it takes forever on my best test machine(Ryzen 5950x 32 threads/128GB RAM) if I am not doing wrong.
I am sorry I cannot open more specific detail of the tasks since it is very confidential project.
If you're looking for run time estimates, you can check out our configuration guide, which includes run times for certain algorithms on a specified graph (LDBC100, ~300M relationships, 1B nodes) and provides the hardware we used to generate the benchmarks. It also provides some guidance on optimizing performance. In general, though, you want to set concurrency as high as possible (EE has unlimited concurrency), and make use of parameters like degreeCutofftopK and topN when available.
"Quality" is a much more nuanced metric - it's going to depend strongly on the data sets you're running an algorithm on, and the problem at hand. Usually we recommend tuning your algo call on a subset of the data to make sure that your parameter combination is giving you sensible results, before running over the full dataset.