I am struggling trying to refactor Jaccard Similarity algorithms previously running successfully in Neo4j 3.4 to the new Node Similarity algorithm in Neo4j 3.5.26 and GDS 1.1.1. There was never a memory issue prior to using the GDS plugin, now it is blocking our progress and motivating us to look elsewhere for scale. Here are the particulars:
|graphName|nodeCount|relationshipCount|
|myJadeThemeGraph|2670295|187|
CALL gds.nodeSimilarity.stream('myJadeThemeGraph')
YIELD node1, node2, similarity
RETURN gds.util.asNode(node1).name AS Theme1, gds.util.asNode(node2).name AS Theme2, similarity
ORDER BY similarity DESCENDING, Theme1, Theme2 limit 10
My result:
Failed to invoke procedure `gds.nodeSimilarity.stream`: Caused by: java.lang.IllegalStateException: Procedure was blocked since minimum estimated memory (130 GiB) exceeds current free memory (24 GiB).
Again, executing Jaccard prior to GDS worked fine. Now gds requires huge amounts of memory to do the same calculations.
I have reduced the size of the projection even further by executing:
CALL gds.graph.create.cypher(
'myJadeThemeGraph',
'MATCH (n) WHERE n:Guest AND n.member_tier= "Jade" OR n:Theme RETURN id(n) as id',
'MATCH (n:Guest)-[pt:PLAYS_THEME]->(m:Theme) where n.member_tier = "Jade" and pt.weight > 10
RETURN id(n) AS source, id(m) as target, type(pt) as type, pt.weight as weight'
)
The projection is reduced to:
Node Count: 2594
Relationship Count: 187
Running the same gds.nodeSimilarity.stream() as before, memory requirements still exceedingly high - even employing TopK
CALL gds.nodeSimilarity.stream('myJadeThemeGraph',{ topK: 1 })
YIELD node1, node2, similarity
RETURN gds.util.asNode(node1).name AS Theme1, gds.util.asNode(node2).name AS Theme2, similarity
ORDER BY similarity DESCENDING limit 10
Failed to invoke procedure `gds.nodeSimilarity.stream`: Caused by: java.lang.IllegalStateException: Procedure was blocked since minimum estimated memory (54 GiB) exceeds current free memory (24 GiB).
Such a small graph shouldn't be triggering that error message - can you update to GDS 1.1.6 (the latest 3.5 compatible branch)?
You'll also want to make sure you don't have other in-memory graphs hanging around - you can use CALL gds.graph.list()
to make sure you're not using up memory there, and drop them if they are there.
Thanks Alicia. We will upgrade to GDS 1.1.6 and increase our HEAP allocations as well. I will update you on status when complete.
Well, the upgrade to GDS 1.1.6 using same size graph projection:
Node Count: 2594
Relationship Count: 187
Calling gds.nodeSimilarity.stream:
CALL gds.nodeSimilarity.stream('myJadeThemeGraph',{ topK: 1 })
YIELD node1, node2, similarity
RETURN gds.util.asNode(node1).name AS Theme1, gds.util.asNode(node2).name AS Theme2, similarity
ORDER BY similarity DESCENDING limit 10
Resulted in following error (note i really reduced the potential return by using topK):
Failed to invoke procedure `gds.nodeSimilarity.stream`: Caused by: java.lang.IllegalStateException: Procedure was blocked since minimum estimated memory (54 GiB) exceeds current free memory (31 GiB).
Hm - that seems like a bug. I've created an issue with the engineering team and we'll keep you posted.
In the meantime, you can override the memory guards by specifying sudo:TRUE
in your algo config:
CALL gds.nodeSimilarity.stream('myJadeThemeGraph',{ topK: 1, sudo:TRUE })
YIELD node1, node2, similarity
RETURN gds.util.asNode(node1).name AS Theme1, gds.util.asNode(node2).name AS Theme2, similarity
ORDER BY similarity DESCENDING limit 10
That should disable the guardrails - and if you actually do have enough memory, it will run fine, if not ... then you may OOM the database.
Thanks Alicia. If there is any information that you and your team needs, please reach out. Will try the sudo parameter and get back to you.
Taking the "memory guards" off, gds.nodeSimilarity.stream() completed in 30ms.
Does that confirm a "bug?"
.... pretty sure it's a bug then. We've created a card, and I'll reach out if we have any more questions!