Neo4j apoc.periodic.iterate

Hi i am new to neo4j.

I have a fike that contains more then 10k rows, i was trying to push the data to neo4j with the help of technique called batching for this i am using apoc.periodic.iterate. but the issue it in my file there is a column named "region" Which contains 16 distinct values. When i am creating the region node without the use of apoc.periodic.iterate its creating 16 nodes which is correct but as soon as i create the node using with the help of apoc.periodic.iterate its creating 90 nodes.

Here is the code snippet i am using:

CALL apoc.periodic.iterate('
LOAD CSV WITH HEADERS FROM "fike:///test.csv" AS row RETURN row',
'WITH row
WHERE row.regionName IS NOT null
MERGE(r:Region {region_name:row.regionName}) ',
{batchSize:1000, parallel:true, iterateList:true})

Can anyone please help me out with this

You have “parallel” as true. I believe this is causing a race condition where the merge does not always detect an existing node a specific region_name. This is because multiple merge operations with the same region_name can execute concurrently when using parallel as true and these merges will not recognize each other until a node already exists. This is because there is no lock on creating nodes unless you have a uniqueness constraint on the merge property.

The solution is to change the parallel to false or try adding a uniqueness constraint on region_name. Uniqueness seems appropriate since these nodes seem like reference nodes..

1 Like

@percyjay1998

what version of Neo4j?

If v5x for example why use apoc.periodic.iterate and rather just https://neo4j.com/docs/cypher-manual/current/subqueries/subqueries-in-transactions/#_loading_csv_data

1 Like

@glilienfield

Thank you for you suggestion, yes it did solve the problem of having duplicate data.

@dana_canzano

I tried implementing CALL but i got an error this is what it says:

[A query with 'CALL { ... } IN TRANSACTIONS' can only be executed in an implicit transaction, but tried to execute in an explicit transaction.]

I am currently using Neo4j desktop, version:5.12.0

Start the query with :auto

I tried with :auto, i got a new error:

[Invalid input ':': expected
"ALTER"
"CALL"
"CREATE"
"DEALLOCATE"
"DELETE"
"DENY"
"DETACH"
"DROP"
"DRYRUN"
"ENABLE"
"FOREACH"
"GRANT"
"LOAD"
"MATCH"
"MERGE"
"OPTIONAL"
"REALLOCATE"
"REMOVE"
"RENAME"
"RETURN"
"REVOKE"
"SET"
"SHOW"
"START"
"STOP"
"TERMINATE"
"UNWIND"
"USE"
"USING"
"WITH" (line 2, column 1 (offset: 8))
":auto LOAD CSV WITH HEADERS FROM "file:///test.csv" AS row"
^]

can you post the entire query?

PROFILE
:auto LOAD CSV WITH HEADERS FROM "file:///test.csv" AS row
WITH row WHERE row.id IS NOT NUll

CALL {
WITH row
MATCH (d:Date {date:"2024-02-26"})-[:HAS_VALUE]-(id:identity {s_id:row.id})

MERGE(id)-[:ID_HAS_COUNTRY]-(c:country {country_name:row.countryName})

}IN TRANSACTIONS OF 1000 ROWS

you need to start the query with :auto, so move profile to after :auto.

2 Likes