cancel
Showing results for 
Search instead for 
Did you mean: 

Apoc.periodic.iterate only writing one batch with parallel

Minyall
Node Link

I've managed to solve my issue but in a way that seems not particularly efficient.

'CALL apoc.periodic.iterate('UNWIND $batch as row RETURN row',
'MATCH (s:STORY), (t:ISSUE) WHERE s.id = row.id AND t.id = row.cat_id 
CREATE (s)-[r:IS_TAGGED_WITH]->(t)', 
{batchSize:10000, parallel:false, iterateList:true, params:{batch:$edge_list}})'

This query works for around 50K relationships between STORY nodes and ISSUE nodes. I use the python driver to pass in a list of dicts as the $edge_list parameter. However, if I set parallel:true the procedure only writes what is probably the first batch, i.e. I only get 10,000 relationships created.

Is this just a quirk of apoc.periodic,iterate, or can I change the query to ensure parallel works as expected?

Many thanks,

4 REPLIES 4

mark_needham
Neo4j
Neo4j

Do you see any errors when you're using the parallel version? I'm wondering if you're getting a deadlock exception because it's trying to write two relationships to the same node in parallel...

Minyall
Node Link

Yes, after running a toy version in the browser rather than through python I saw the errors regarding the lock. Is there a way to rewrite the query that works around that, or is it just the nature of neo4j?

Many thanks for your reply!

mark_needham
Neo4j
Neo4j

I don't think there's a way to work around it by rewriting the query, but you can set the retries parameter, which will retry up to a specified number of times if it runs into problems.

See https://neo4j.com/docs/labs/apoc/current/graph-updates/periodic-execution/#commit-batching for more details.

Thanks Mark! I'll keep retries in mind.

Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit

On November 16 and 17 for 24 hours across all timezones, you’ll learn about best practices for beginners and experts alike.