cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! Site migration is underway. Pause, resolving how to handle anonymous content

Apoc.periodic.iterate only writing one batch with parallel

Minyall
Node Link

I've managed to solve my issue but in a way that seems not particularly efficient.

'CALL apoc.periodic.iterate('UNWIND $batch as row RETURN row',
'MATCH (s:STORY), (t:ISSUE) WHERE s.id = row.id AND t.id = row.cat_id 
CREATE (s)-[r:IS_TAGGED_WITH]->(t)', 
{batchSize:10000, parallel:false, iterateList:true, params:{batch:$edge_list}})'

This query works for around 50K relationships between STORY nodes and ISSUE nodes. I use the python driver to pass in a list of dicts as the $edge_list parameter. However, if I set parallel:true the procedure only writes what is probably the first batch, i.e. I only get 10,000 relationships created.

Is this just a quirk of apoc.periodic,iterate, or can I change the query to ensure parallel works as expected?

Many thanks,

4 REPLIES 4

mark_needham
Neo4j
Neo4j

Do you see any errors when you're using the parallel version? I'm wondering if you're getting a deadlock exception because it's trying to write two relationships to the same node in parallel...

Minyall
Node Link

Yes, after running a toy version in the browser rather than through python I saw the errors regarding the lock. Is there a way to rewrite the query that works around that, or is it just the nature of neo4j?

Many thanks for your reply!

mark_needham
Neo4j
Neo4j

I don't think there's a way to work around it by rewriting the query, but you can set the retries parameter, which will retry up to a specified number of times if it runs into problems.

See https://neo4j.com/docs/labs/apoc/current/graph-updates/periodic-execution/#commit-batching for more details.

Thanks Mark! I'll keep retries in mind.

Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit

All the sessions of the conference are now available online