I am using Community edition 3.5 and the following query throws a heap size error. Is there anything I could change in the query before i try to increase the heapsize?
CALL apoc.periodic.iterate(
"MATCH ((c1:citation) -[p1:probability]-> (:lda_topic) <-[p2:probability]- (c2:citation))
WHERE id(c1) < id(c2)
WITH c1, c2, toFloat(p1.prob) as w1, toFloat(p2.prob) as w2, 2 as precision
WITH c1, c2, w1 + w2 as w12, 10^precision as factor
WITH c1, c2, min(w12)/max(w12) as weight , factor
WITH c1, c2, 1 - round(factor * weight) / factor as weight
OPTIONAL MATCH (a:alias) -[:authored]-> (c1)
OPTIONAL MATCH (b:alias) -[:authored]-> (c2)
RETURN a, b, weight, c1.citation as citation1, c2.citation as citation2",
"CREATE (a)-[e:through_topics_WJ]->(b)
SET e.weight= weight,
e.citation1 = citation1
e.citation2 = citation2", {batchSize:5000}) // batch size reduced since there are lot of paths - maybe filter p1.weight+p2.weight > 0.25 or up (ideally > 1 )??
YIELD batches, total, errorMessages
CALL apoc.periodic.iterate(
"MATCH ((c1:citation) -[p1:probability]-> (:lda_topic) <-[p2:probability]- (c2:citation))
WHERE id(c1) < id(c2)
WITH c1, c2, toFloat(p1.prob) as w1, toFloat(p2.prob) as w2, 2 as precision
WITH c1, c2, w1 + w2 as w12, 10^precision as factor
WITH c1, c2, min(w12)/max(w12) as weight , factor
WITH c1, c2, 1 - round(factor * weight) / factor as weight
OPTIONAL MATCH (a:alias) -[:authored]-> (c1)
OPTIONAL MATCH (b:alias) -[:authored]-> (c2)
RETURN a, b, weight, c1.citation as citation1, c2.citation as citation2",
"CREATE (a)-[e:through_topics_WJ]->(b)
SET e.weight= weight,
e.citation1 = citation1
e.citation2 = citation2", {batchSize:1000, iterateList:true, parallel:true}) // batch size reduced since there are lot of paths
YIELD batches, total, errorMessages
and still get the heapsize error.
I did not create any constraints on the new relationship that is created since I am using 3.5 community version - not enterprise version. Also I am using APOC 4.0.0.1. Please help.
What is the issue with APOC 4.0.x. Some other APOC procedure queries ran fine for me, except this one and the one below (which is running for more than an hour now):
CALL apoc.periodic.iterate(
"MATCH ((c1:citation) -[p1:probability]-> (t:lda_topic) <-[p2:probability]- (c2:citation))
WHERE id(c1) < id(c2) AND (toFloat(p1.prob) + toFloat(p2.prob) > 1)
WITH c1, c2, toFloat(p1.prob) as w1, toFloat(p2.prob) as w2, t, 2 as precision
WITH c1, c2, w1, w2, t, 10^precision as factor
WITH c1, c2, t, round(factor* (1/(2+w1+w2))) / factor as weight
OPTIONAL MATCH (a:alias) -[:authored]-> (c1)
OPTIONAL MATCH (b:alias) -[:authored]-> (c2)
RETURN a, b, weight, t, c1.citation as citation1, c2.citation as citation2",
"CREATE (a)-[e:through_topic]->(b)
SET e.weight= weight,
e.topic = t.entity_type,
e.citation1 = citation1
e.citation2 = citation2", {batchSize:5000})
YIELD batches, total, errorMessages
Yes, you want to avoid having eager operations in the driving (outer) query, so the min() and max() are the problem. Aggregations are eager, meaning that all rows must be manifested in memory for them to take place.
You will need to move the aggregation part of your query (and anything that is based upon those aggregations into the updating query.
As for your later query, that doesn't have aggregations in the driving query, so that shouldn't be an issue. Check node creation to ensure it's executing its batches, it just might have a lot of data to process.
yes, I indeed got my later query complete successfully after sometime.
Regarding my first query, I will move the aggregate functions to the updating query - like below:
CALL apoc.periodic.iterate(
"MATCH ((c1:citation) -[p1:probability]-> (:lda_topic) <-[p2:probability]- (c2:citation))
WHERE id(c1) < id(c2) < 626630
WITH c1, c2, toFloat(p1.prob) as w1, toFloat(p2.prob) as w2, 2 as precision
RETURN c1, c2, w1 + w2 as w12, 10^precision as factor",
"WITH c1, c2, min(w12)/max(w12) as weight , factor
WITH c1, c2, 1 - round(factor * weight) / factor as weight
OPTIONAL MATCH (a:alias) -[:authored]-> (c1)
OPTIONAL MATCH (b:alias) -[:authored]-> (c2)
WITH a, b, weight, c1.citation as citation1, c2.citation as citation2
CREATE (a)-[e:through_topics_WJ]->(b)
SET e.weight= weight,
e.citation1 = citation1,
e.citation2 = citation2", {batchSize:1000, iterateList:true, parallel:true}) // batch size reduced since there are lot of paths - maybe filter p1.weight+p2.weight > 0.25 or up (ideally > 1 )??
YIELD batches, total, errorMessages