My version is neo4j-community-4.0.4. My loading code:
CALL apoc.periodic.iterate("
CALL apoc.load.csv('/data/20210929/test6_keyword_text_keywordOf.csv', {nullValues:['','na',false], sep:' '})
yield map as row ",
"MATCH (h:keyword{nid: row._start})
MATCH (t:text{nid: row._end})
CALL apoc.merge.relationship(h, row._type, {}, {}, t, {}) yield rel
RETURN rel", {batchSize:1000, iterateList:true, parallel:true})
If parallel:true, if I keep running the same csv loading code, the relationships increase accordingly. 'keep running' means that I run the code once, check the results, then. restart the program to load again. Nothing changes of my code and csv data between different runs.
If If parallel:false, the relationships created keep constant for multiple runs. This is expected.
My relationship data format:
start end type
867f9c6f099 589e81c406 keywordOf
Then I copy the loading statement for one csv file into the neo4j browser to load directly without using my loading code, it shows error message below, but if I re-run the loading statement in the browser again, the error is gone (parallel:true):
{
"total": 13,
"committed": 6,
"failed": 7,
"errors": {
"org.neo4j.graphdb.QueryExecutionException: LockClient[22639] can't wait on resource RWLock[NODE(2080), hash=895117327] since => LockClient[22639] <-[:held_by]- rwlock[node(21381), hash="1970823408]" lockclient[22646] rwlock[node(2080), 1, "org.neo4j.graphdb.queryexecutionexception: lockclient[22650] can't wait on resource rwlock[node(186), since> LockClient[22650] <-[:held_by]- rwlock[node(3390), hash="1637932220]" lockclient[22642] rwlock[node(186), 1, "org.neo4j.graphdb.queryexecutionexception: lockclient[22638] can't wait on resource rwlock[node(40), since> LockClient[22638] <-[:held_by]- rwlock[node(343), hash="1225909397]" lockclient[22639] rwlock[node(40), 1, "org.neo4j.graphdb.queryexecutionexception: lockclient[22642] can't wait on resource rwlock[node(3729), since> LockClient[22642] <-[:held_by]- rwlock[node(20676), hash="1845973237]" lockclient[22639] rwlock[node(21381), lockclient[22646] rwlock[node(3729), 1, "org.neo4j.graphdb.queryexecutionexception: lockclient[22640] can't wait on resource rwlock[node(3644), since> LockClient[22640] <-[:held_by]- rwlock[node(5317), hash="1804000373]" lockclient[22638] rwlock[node(3644), 1, "org.neo4j.graphdb.queryexecutionexception: lockclient[22644] can't wait on resource rwlock[node(914), since> LockClient[22644] <-[:held_by]- rwlock[node(3384), hash="1276882950]" lockclient[22646] rwlock[node(914), 1, "org.neo4j.graphdb.queryexecutionexception: lockclient[22643] can't wait on resource rwlock[node(1696), since> LockClient[22643] </-[:held_by]-></-[:held_by]-></-[:held_by]-></-[:held_by]-></-[:held_by]-></-[:held_by]->
In the documentation, it says it can be parallel:true to speed up. Also, for debugging purpose, I selected a small number of nodes and relationships, the issue can not be reproduced when parallel:true, probably because the data is less than the batch size (1000).
What's the implication of using parallel:true in apoc.periodic.iterate?