cancel
Showing results for 
Search instead for 
Did you mean: 

Node Similarity Algorithm (Weighted Jaccard) WHERE syntax

parthiv3215
Node Clone

I am trying to run the weighted Jaccard algorithm on my graph (following the Neo4j documentation as reference)

The code:

CALL gds.nodeSimilarity.stream('test', { relationshipWeightProperty: 'strength', similarityCutoff: 0.1 })
YIELD node1, node2, similarity
RETURN gds.util.asNode(node1).name AS Person1, gds.util.asNode(node2).name AS Person2, similarity
ORDER BY Person1

The code above runs perfectly. However, I want to to filter node1 and node2 to only show results for the nodes that I required. I tried entering a "WHERE node1.name = 'Chair1' " right after my YIELD statement. However, it spews an error. How do I add a WHERE statement to only get the result for nodes that I want and not all of them. (Even in the documentation: Node Similarity - Neo4j Graph Data Science I see that duplicate pairs i.e. Alice-Dave and Dave-Alice are returned).

1 ACCEPTED SOLUTION

Cobra
Ninja
Ninja

Hello @parthiv3215

CALL gds.nodeSimilarity.stream('test', { relationshipWeightProperty: 'strength', similarityCutoff: 0.1 })
YIELD node1, node2, similarity
WITH gds.util.asNode(node1) AS n1, gds.util.asNode(node2)AS n2, similarity
WHERE n1.name = "Chair1"
RETURN n1.name, n2.name, similarity
ORDER BY n1.name

Regards,
Cobra

View solution in original post

5 REPLIES 5

Cobra
Ninja
Ninja

Hello @parthiv3215

CALL gds.nodeSimilarity.stream('test', { relationshipWeightProperty: 'strength', similarityCutoff: 0.1 })
YIELD node1, node2, similarity
WITH gds.util.asNode(node1) AS n1, gds.util.asNode(node2)AS n2, similarity
WHERE n1.name = "Chair1"
RETURN n1.name, n2.name, similarity
ORDER BY n1.name

Regards,
Cobra

parthiv3215
Node Clone

Thank you so much. This works perfectly!

Hello, I have a more conceptual question on exactly this kind of solution that I can't find the solution on the official documentation. Running that exact query that Cobra gently provided, what is happening under the hood?
Is gds :
A) calculating ALL the similarities between every node1 and node2 and then filtering the results only for Chair1?
OR
B) Is gds ONLY calculating the results between Chair1 and every other node?

I'd need behaviour B to happen for me, but after some testing with the airport databases it seems that the execution time is shorter without the WHERE clause than with, so my nose tells me that it may be behaviour A. Is there a way to force behaviour B?

For the above code snippet, GDS is calculating all the similarities and post-filtering the results (the WHERE is applied to the result stream from the node similarity algorithm).

More sophisticated filtering for Node Similarity & KNN will be coming in the 2.1 release, so stay tuned, @carlo.martinotti89

That's what I suspected! Welp, unfortunate but by now the database is small enough that I can afford doing the complete calculation every time. Thank you for the reply and I'll definitely stay tuned for more releases 🙂

Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit

On November 16 and 17 for 24 hours across all timezones, you’ll learn about best practices for beginners and experts alike.