Node Similarity Algorithm (Weighted Jaccard) WHERE syntax

parthiv3215 · January 26, 2022, 3:15pm

I am trying to run the weighted Jaccard algorithm on my graph (following the Neo4j documentation as reference)

The code:

CALL gds.nodeSimilarity.stream('test', { relationshipWeightProperty: 'strength', similarityCutoff: 0.1 })
YIELD node1, node2, similarity
RETURN gds.util.asNode(node1).name AS Person1, gds.util.asNode(node2).name AS Person2, similarity
ORDER BY Person1

The code above runs perfectly. However, I want to to filter node1 and node2 to only show results for the nodes that I required. I tried entering a "WHERE node1.name = 'Chair1' " right after my YIELD statement. However, it spews an error. How do I add a WHERE statement to only get the result for nodes that I want and not all of them. (Even in the documentation: Node Similarity - Neo4j Graph Data Science I see that duplicate pairs i.e. Alice-Dave and Dave-Alice are returned).

Cobra · January 26, 2022, 3:32pm

Hello @parthiv3215

CALL gds.nodeSimilarity.stream('test', { relationshipWeightProperty: 'strength', similarityCutoff: 0.1 })
YIELD node1, node2, similarity
WITH gds.util.asNode(node1) AS n1, gds.util.asNode(node2)AS n2, similarity
WHERE n1.name = "Chair1"
RETURN n1.name, n2.name, similarity
ORDER BY n1.name

Regards,
Cobra

parthiv3215 · January 26, 2022, 3:49pm

Thank you so much. This works perfectly!

carlo.martinotti89 · May 16, 2022, 5:30am

Hello, I have a more conceptual question on exactly this kind of solution that I can't find the solution on the official documentation. Running that exact query that Cobra gently provided, what is happening under the hood?
Is gds :
A) calculating ALL the similarities between every node1 and node2 and then filtering the results only for Chair1?
OR
B) Is gds ONLY calculating the results between Chair1 and every other node?

I'd need behaviour B to happen for me, but after some testing with the airport databases it seems that the execution time is shorter without the WHERE clause than with, so my nose tells me that it may be behaviour A. Is there a way to force behaviour B?

alicia_frame1 · May 17, 2022, 3:59pm

For the above code snippet, GDS is calculating all the similarities and post-filtering the results (the WHERE is applied to the result stream from the node similarity algorithm).

More sophisticated filtering for Node Similarity & KNN will be coming in the 2.1 release, so stay tuned, @carlo.martinotti89

carlo.martinotti89 · May 18, 2022, 2:34am

That's what I suspected! Welp, unfortunate but by now the database is small enough that I can afford doing the complete calculation every time. Thank you for the reply and I'll definitely stay tuned for more releases :)

Topic		Replies	Views
Node Similarity algorithm problem Graph Data Science / Graph Analytics	1	489	July 21, 2023
Graph Data Science "Node Similarity" algorithm documentation is partially unclear Neo4j Graph Platform	0	358	December 13, 2020
Graph Data Science: Filtered Node Similarity Neo4j Graph Platform migrated	2	197	November 16, 2022
How to use Jaccard similarity algorithm in neo4j to find the similar nodes Procedures & APOC cypher	17	4533	January 17, 2019
Jaccard in Alpha forever Graph Data Science / Graph Analytics	8	600	March 10, 2021

Take the Course Then Join The Aura Agent Hackathon

Node Similarity Algorithm (Weighted Jaccard) WHERE syntax

Related topics

Take the Course Then Join
The Aura Agent Hackathon