Comparing Jaccard Similarity (Neo4J 3.4) to Node Similarity on Neo4j 3.5 and GDS 1.1.1

stu_v_kerr · April 21, 2021, 6:25pm

I am struggling trying to refactor Jaccard Similarity algorithms previously running successfully in Neo4j 3.4 to the new Node Similarity algorithm in Neo4j 3.5.26 and GDS 1.1.1. There was never a memory issue prior to using the GDS plugin, now it is blocking our progress and motivating us to look elsewhere for scale. Here are the particulars:

|graphName|nodeCount|relationshipCount|
|myJadeThemeGraph|2670295|187|

CALL gds.nodeSimilarity.stream('myJadeThemeGraph')
YIELD node1, node2, similarity
RETURN gds.util.asNode(node1).name AS Theme1, gds.util.asNode(node2).name AS Theme2, similarity
ORDER BY similarity DESCENDING, Theme1, Theme2 limit 10

My result:

Failed to invoke procedure `gds.nodeSimilarity.stream`: Caused by: java.lang.IllegalStateException: Procedure was blocked since minimum estimated memory (130 GiB) exceeds current free memory (24 GiB).

Again, executing Jaccard prior to GDS worked fine. Now gds requires huge amounts of memory to do the same calculations.

stu_v_kerr · April 21, 2021, 8:12pm

I have reduced the size of the projection even further by executing:

CALL gds.graph.create.cypher(
	'myJadeThemeGraph',
	'MATCH (n) WHERE n:Guest AND n.member_tier= "Jade" OR n:Theme RETURN id(n) as id',
	'MATCH (n:Guest)-[pt:PLAYS_THEME]->(m:Theme) where n.member_tier = "Jade" and pt.weight > 10
	RETURN id(n) AS source, id(m) as target, type(pt) as type, pt.weight as weight'

)

The projection is reduced to:

Node Count: 2594
Relationship Count: 187

Running the same gds.nodeSimilarity.stream() as before, memory requirements still exceedingly high - even employing TopK

CALL gds.nodeSimilarity.stream('myJadeThemeGraph',{ topK: 1 })
YIELD node1, node2, similarity
RETURN gds.util.asNode(node1).name AS Theme1, gds.util.asNode(node2).name AS Theme2, similarity
ORDER BY similarity DESCENDING limit 10

Failed to invoke procedure `gds.nodeSimilarity.stream`: Caused by: java.lang.IllegalStateException: Procedure was blocked since minimum estimated memory (54 GiB) exceeds current free memory (24 GiB).

alicia_frame1 · April 22, 2021, 12:10am

Such a small graph shouldn't be triggering that error message - can you update to GDS 1.1.6 (the latest 3.5 compatible branch)?

You'll also want to make sure you don't have other in-memory graphs hanging around - you can use CALL gds.graph.list() to make sure you're not using up memory there, and drop them if they are there.

stu_v_kerr · April 22, 2021, 5:21am

Thanks Alicia. We will upgrade to GDS 1.1.6 and increase our HEAP allocations as well. I will update you on status when complete.

stu_v_kerr · April 22, 2021, 5:16pm

Well, the upgrade to GDS 1.1.6 using same size graph projection:

Node Count: 2594
Relationship Count: 187

Calling gds.nodeSimilarity.stream:

CALL gds.nodeSimilarity.stream('myJadeThemeGraph',{ topK: 1 })
YIELD node1, node2, similarity
RETURN gds.util.asNode(node1).name AS Theme1, gds.util.asNode(node2).name AS Theme2, similarity
ORDER BY similarity DESCENDING limit 10

Resulted in following error (note i really reduced the potential return by using topK):

Failed to invoke procedure `gds.nodeSimilarity.stream`: Caused by: java.lang.IllegalStateException: Procedure was blocked since minimum estimated memory (54 GiB) exceeds current free memory (31 GiB).

alicia_frame1 · April 22, 2021, 6:58pm

Hm - that seems like a bug. I've created an issue with the engineering team and we'll keep you posted.

In the meantime, you can override the memory guards by specifying sudo:TRUE in your algo config:

CALL gds.nodeSimilarity.stream('myJadeThemeGraph',{ topK: 1, sudo:TRUE })
YIELD node1, node2, similarity
RETURN gds.util.asNode(node1).name AS Theme1, gds.util.asNode(node2).name AS Theme2, similarity
ORDER BY similarity DESCENDING limit 10

That should disable the guardrails - and if you actually do have enough memory, it will run fine, if not ... then you may OOM the database.

stu_v_kerr · April 22, 2021, 8:03pm

Thanks Alicia. If there is any information that you and your team needs, please reach out. Will try the sudo parameter and get back to you.

stu_v_kerr · April 22, 2021, 8:10pm

Taking the "memory guards" off, gds.nodeSimilarity.stream() completed in 30ms.
Does that confirm a "bug?"

alicia_frame1 · April 22, 2021, 8:11pm

.... pretty sure it's a bug then. We've created a card, and I'll reach out if we have any more questions!

Topic		Replies	Views
Does anyone know where Jaccard similarity algorithm is? Neo4j Graph Platform	5	293	October 16, 2021
Jaccard in Alpha forever Graph Algorithms/Graph Data Science	8	521	March 10, 2021
Node Similarity Algorithm (Weighted Jaccard) WHERE syntax Graph Algorithms/Graph Data Science browser , cypher	5	467	May 18, 2022
Graph Data Science gds.similarity.cosine() Graph Algorithms/Graph Data Science	1	68	December 11, 2024
How to use Jaccard similarity algorithm in neo4j to find the similar nodes Procedures & APOC cypher	17	4242	January 17, 2019

Comparing Jaccard Similarity (Neo4J 3.4) to Node Similarity on Neo4j 3.5 and GDS 1.1.1

Related topics