I’m running the following cypher query using neo4j-drivier in a nodejs app to several times (different combinations of each node label):
CALL apoc.periodic.iterate("
CALL apoc.load.json('file://{fileName}')
YIELD value AS line RETURN line
", "
match (a:{item[0]} {ID:line.startid}), (b:{item[1]} {ID:line.endid})
merge (a)-[{label.toLowerCase()}:{label} {ID:line.id}]->(b) {updateStatements.join("")} return ${label.toLowerCase()}
",
{
batchSize: 1000,
iterateList: true,
parallel: false,
params: {},
concurrency: 50,
retries: 0
}
)
After some time, App terminates with the following error:
query failed Failed to invoke procedure apoc.periodic.iterate: Caused by: java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@ad566f2 rejected from java.util.concurrent.ThreadPoolExecutor@6e8e6e59[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 8436]
(node:17822) UnhandledPromiseRejectionWarning: Neo4jError: Failed to invoke procedure apoc.periodic.iterate: Caused by: java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@ad566f2 rejected from java.util.concurrent.ThreadPoolExecutor@6e8e6e59[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 8436]
Any idea on which parm/config setting needs to be tweaked? Im running an app on a MB pro with 8GB ram. Json files vary from 300k to 1M++ records. Thanks.
Do you have indexes on the relevant label + property combinations? If not that could be heap intensive. You may also consider lowering your batch size.
Thanks Andrew. That’s right, there are indexes on the label + property. I've also tried running with lower and higher batch size but still get the same error. Prior to these queries, the nodes are loaded first. So these queries are for loading the relationships.
A sample generated query looks something like this:
CALL apoc.periodic.iterate("
CALL apoc.load.json('file:///Path/File.json')
YIELD value AS line RETURN line
", "
match (a:Company), (b:Person) where a.ID = line.startid and b.ID = line.endid
merge (a)-[relationship:RelType {ID: line.id}]->(b) FOREACH(_ IN CASE WHEN trim(line.description) <> '' THEN [1] ELSE END | SET relationship.description = line.description ....
return relationship
Indexes exist on :Person(ID) and :Company(ID), and there are about over 1million nodes for both Person and Company.
You'll probably not want to return anything, as the batched query in APOC won't do anything with the returned values. And you can forgo the FOREACH and just use filtering:
match (a:Company), (b:Person)
where a.ID = line.startid and b.ID = line.endid
merge (a)-[relationship:RelType {ID: line.id}]->(b)
WITH relationship, line
WHERE line.description <> ''
SET relationship.description = line.description
Thanks Andrew for the suggestion. The queries ran significantly faster after using WITH instead of FOREACH and removing the RETURN. I re-ran the queries using smaller batchsize of 500 but still failed with same error. The error seems to come up when processing/iterating through huge files (around 300k - 1mil records).