I've managed to solve my issue but in a way that seems not particularly efficient.
'CALL apoc.periodic.iterate('UNWIND $batch as row RETURN row',
'MATCH (s:STORY), (t:ISSUE) WHERE s.id = row.id AND t.id = row.cat_id
CREATE (s)-[r:IS_TAGGED_WITH]->(t)',
{batchSize:10000, parallel:false, iterateList:true, params:{batch:$edge_list}})'
This query works for around 50K relationships between STORY nodes and ISSUE nodes. I use the python driver to pass in a list of dicts as the $edge_list parameter. However, if I set parallel:true the procedure only writes what is probably the first batch, i.e. I only get 10,000 relationships created.
Is this just a quirk of apoc.periodic,iterate, or can I change the query to ensure parallel works as expected?
Do you see any errors when you're using the parallel version? I'm wondering if you're getting a deadlock exception because it's trying to write two relationships to the same node in parallel...
Yes, after running a toy version in the browser rather than through python I saw the errors regarding the lock. Is there a way to rewrite the query that works around that, or is it just the nature of neo4j?
I don't think there's a way to work around it by rewriting the query, but you can set the retries parameter, which will retry up to a specified number of times if it runs into problems.