Running out of heap memory size

lavanya_kannan · February 20, 2020, 2:46am

Hi,

I am using Community edition 3.5 and the following query throws a heap size error. Is there anything I could change in the query before i try to increase the heapsize?

CALL apoc.periodic.iterate(
	"MATCH  ((c1:citation) -[p1:probability]-> (:lda_topic) <-[p2:probability]- (c2:citation))
	WHERE id(c1) < id(c2) 
	WITH c1, c2, toFloat(p1.prob) as w1, toFloat(p2.prob) as w2, 2 as precision
    WITH c1, c2, w1 + w2 as w12, 10^precision as factor
    WITH c1, c2, min(w12)/max(w12) as weight , factor
	WITH c1, c2, 1 - round(factor * weight) / factor as weight
	OPTIONAL MATCH (a:alias) -[:authored]-> (c1) 
	OPTIONAL MATCH (b:alias) -[:authored]-> (c2) 
	RETURN a, b, weight, c1.citation as citation1, c2.citation as citation2",
	"CREATE (a)-[e:through_topics_WJ]->(b)
	SET e.weight= weight, 
	 e.citation1 = citation1
	 e.citation2 = citation2", {batchSize:5000})           // batch size reduced since there are lot of paths - maybe filter  p1.weight+p2.weight > 0.25 or up (ideally > 1 )??
YIELD batches, total, errorMessages

Thanks

Kailash · February 20, 2020, 5:44am

Try This - there is a similar thread

lavanya_kannan · February 20, 2020, 3:07pm

@Kailash @ganesanmithun323 @andrew_bowman thanks for the suggestions:

I tried the following:

Increase the heap size in neo4j.conf file:

dbms.memory.heap.initial_size=8G
dbms.memory.heap.max_size=8G
dbms.memory.pagecache.size=8G

and ran

CALL apoc.periodic.iterate(
	"MATCH  ((c1:citation) -[p1:probability]-> (:lda_topic) <-[p2:probability]- (c2:citation))
	WHERE id(c1) < id(c2) 
	WITH c1, c2, toFloat(p1.prob) as w1, toFloat(p2.prob) as w2, 2 as precision
    WITH c1, c2, w1 + w2 as w12, 10^precision as factor
    WITH c1, c2, min(w12)/max(w12) as weight , factor
	WITH c1, c2, 1 - round(factor * weight) / factor as weight
	OPTIONAL MATCH (a:alias) -[:authored]-> (c1) 
	OPTIONAL MATCH (b:alias) -[:authored]-> (c2) 
	RETURN a, b, weight, c1.citation as citation1, c2.citation as citation2",
	"CREATE (a)-[e:through_topics_WJ]->(b)
	SET e.weight= weight, 
	 e.citation1 = citation1
	 e.citation2 = citation2", {batchSize:1000, iterateList:true, parallel:true})           // batch size reduced since there are lot of paths 
YIELD batches, total, errorMessages

and still get the heapsize error.

I did not create any constraints on the new relationship that is created since I am using 3.5 community version - not enterprise version. Also I am using APOC 4.0.0.1. Please help.

Thanks,
Lavanya

ganesanmithun323 · February 20, 2020, 3:16pm

hi , dont use the apoc 4.0.x . It looks there is an issue with it . Try with the older apoc jar.

lavanya_kannan · February 20, 2020, 4:02pm

What is the issue with APOC 4.0.x. Some other APOC procedure queries ran fine for me, except this one and the one below (which is running for more than an hour now):

CALL apoc.periodic.iterate(
	"MATCH  ((c1:citation) -[p1:probability]-> (t:lda_topic) <-[p2:probability]- (c2:citation))
	WHERE id(c1) < id(c2) AND (toFloat(p1.prob) + toFloat(p2.prob) > 1)
	WITH c1, c2, toFloat(p1.prob) as w1, toFloat(p2.prob) as w2, t, 2 as precision
	WITH c1, c2, w1, w2, t, 10^precision as factor
	WITH c1, c2, t, round(factor* (1/(2+w1+w2))) / factor as weight
	OPTIONAL MATCH (a:alias) -[:authored]-> (c1) 
	OPTIONAL MATCH (b:alias) -[:authored]-> (c2) 
	RETURN a, b, weight, t, c1.citation as citation1, c2.citation as citation2",
	"CREATE (a)-[e:through_topic]->(b)
	SET e.weight= weight, 
	 e.topic = t.entity_type,
	 e.citation1 = citation1
	 e.citation2 = citation2", {batchSize:5000})          
YIELD batches, total, errorMessages

Let me know if this related to the APOC 4.0.x.

Thanks,
Lavanya

ganesanmithun323 · February 20, 2020, 4:03pm

I am not very certain about it . But , its mentioned in the thread shared by kailash .

andrew_bowman · February 20, 2020, 7:44pm

Yes, you want to avoid having eager operations in the driving (outer) query, so the min() and max() are the problem. Aggregations are eager, meaning that all rows must be manifested in memory for them to take place.

You will need to move the aggregation part of your query (and anything that is based upon those aggregations into the updating query.

As for your later query, that doesn't have aggregations in the driving query, so that shouldn't be an issue. Check node creation to ensure it's executing its batches, it just might have a lot of data to process.

lavanya_kannan · February 20, 2020, 8:32pm

@andrew_bowman

yes, I indeed got my later query complete successfully after sometime.

Regarding my first query, I will move the aggregate functions to the updating query - like below:

CALL apoc.periodic.iterate(
	"MATCH  ((c1:citation) -[p1:probability]-> (:lda_topic) <-[p2:probability]- (c2:citation))
	WHERE id(c1) < id(c2) < 626630
	WITH c1, c2, toFloat(p1.prob) as w1, toFloat(p2.prob) as w2, 2 as precision
    RETURN c1, c2, w1 + w2 as w12, 10^precision as factor",
    "WITH c1, c2, min(w12)/max(w12) as weight , factor
	WITH c1, c2, 1 - round(factor * weight) / factor as weight
	OPTIONAL MATCH (a:alias) -[:authored]-> (c1) 
	OPTIONAL MATCH (b:alias) -[:authored]-> (c2) 
	WITH a, b, weight, c1.citation as citation1, c2.citation as citation2
	CREATE (a)-[e:through_topics_WJ]->(b)
	SET e.weight= weight, 
	 e.citation1 = citation1,
	 e.citation2 = citation2", {batchSize:1000, iterateList:true, parallel:true})           // batch size reduced since there are lot of paths - maybe filter  p1.weight+p2.weight > 0.25 or up (ideally > 1 )??
YIELD batches, total, errorMessages

and update here. Thanks.

lavanya_kannan · February 20, 2020, 9:11pm

Update: Ran within a minute!

Topic		Replies	Views
Creating relationship between millions of nodes and runnning out of heap memory Cypher apoc , cypher	9	1830	February 20, 2020
Java Heap space issue while creating relationship between nodes Neo4j Graph Platform performance	2	364	November 16, 2021
Graph Algorithms Link prediction memory error General apoc , cypher , neo4j-desktop	10	660	July 15, 2020
Out of memory error on delete query Cypher	7	550	February 23, 2021
What's the ideal Neo4j memory configuration to avoid java.lang.outofmemoryerror java heap space error? Procedures & APOC	2	582	March 7, 2021

Running out of heap memory size

Related topics