Hi Everyone:
I have been using the Graph algorithms over the last 2 months. I have spent so much time reading up/ watching videos / using slack / and reading virtually any online resource I can get my hands on, on how to use the algorithms. However the main massive obstacle that I am struggling to overcome is how to know how to use the config parameters for the algorithms in the correct way. The area in particular I am struggling with is the Similarity algorithms. I have observed from lots of the online resources on Graph Algorithms that running similarity algorithms seems to be an important pre-requisite to running further community detection algorithms, and I am pretty sure once I can get my head around the concepts of how to analyse the results of the Similarity algorithms then the usage of the other Community detection algorithms and Centrality algorithms will fall into place.
In a nut shell basically I am trying to determine how to analyse the results yielded from the similarity algorithms to determine what value I should be setting for the parameters: “topK”, “similarityCutoff”, “degreeCutoff”. I understand exactly what these parameters mean, but what I don’t know is when the results get yielded back:
-
what is the result I should be looking for? to indicate that I have used the correct combination of values for the parameters: “topK”, “similarityCutoff”, “degreeCutoff”.
-
And if the results yielded back are incorrect then how do I know which parameters I should tweak to get to closer to the desired results? I guess the tweaking of parameters would be an iterative process to get to the desired results.
I have put some my code below along with results back, also the counts of Nodes are as follows:
Customer Nodes: 35,724
Moment Nodes: 18,863
PERFORMS Relationships: 357,503
For the 1st iteration I have set degreeCutoff: 1 so I exclude dissimilar Customer Nodes and topK: 10 as this was some advice I was given (but I would like to know why and how I should tweak this based on yielded results)
MATCH (c:Customer)-[:PERFORMS]->(m:Moment)
WITH c, collect(id(m)) AS colM
WITH {item:id(c), categories: colM} as customerData
WITH collect(customerData) as data
CALL algo.similarity.jaccard(data, {degreeCutoff: 1, write:false, writeRelationshipType:'JACCARD_SIMILARITY', topK: 10})
YIELD nodes, similarityPairs, min, max, mean, stdDev, p25, p50, p75, p90, p95, p99, p999, p100
RETURN nodes, similarityPairs, min, max, mean, stdDev, p25, p50, p75, p90, p95, p99, p999, p100
So far anything I have seen online only shows off the algorithms and their uses, but there is very little online on how to use the parameters for the algorithms and how to interpret the yielded results. I was hoping that by reaching out to you guys that you may be able to help me find a solution to my 2 questions above please?
Thanks,
Johnny
