Neo4j apoc.periodic.iterate

percyjay1998 · February 26, 2024, 6:43pm

Hi i am new to neo4j.

I have a fike that contains more then 10k rows, i was trying to push the data to neo4j with the help of technique called batching for this i am using apoc.periodic.iterate. but the issue it in my file there is a column named "region" Which contains 16 distinct values. When i am creating the region node without the use of apoc.periodic.iterate its creating 16 nodes which is correct but as soon as i create the node using with the help of apoc.periodic.iterate its creating 90 nodes.

Here is the code snippet i am using:

CALL apoc.periodic.iterate('
LOAD CSV WITH HEADERS FROM "fike:///test.csv" AS row RETURN row',
'WITH row
WHERE row.regionName IS NOT null
MERGE(r:Region {region_name:row.regionName}) ',
{batchSize:1000, parallel:true, iterateList:true})

Can anyone please help me out with this

glilienfield · February 26, 2024, 7:22pm

You have “parallel” as true. I believe this is causing a race condition where the merge does not always detect an existing node a specific region_name. This is because multiple merge operations with the same region_name can execute concurrently when using parallel as true and these merges will not recognize each other until a node already exists. This is because there is no lock on creating nodes unless you have a uniqueness constraint on the merge property.

The solution is to change the parallel to false or try adding a uniqueness constraint on region_name. Uniqueness seems appropriate since these nodes seem like reference nodes..

dana_canzano · February 27, 2024, 1:56am

@percyjay1998

what version of Neo4j?

If v5x for example why use apoc.periodic.iterate and rather just https://neo4j.com/docs/cypher-manual/current/subqueries/subqueries-in-transactions/#_loading_csv_data

percyjay1998 · March 1, 2024, 12:37pm

@glilienfield

Thank you for you suggestion, yes it did solve the problem of having duplicate data.

percyjay1998 · March 1, 2024, 12:39pm

@dana_canzano

I tried implementing CALL but i got an error this is what it says:

[A query with 'CALL { ... } IN TRANSACTIONS' can only be executed in an implicit transaction, but tried to execute in an explicit transaction.]

I am currently using Neo4j desktop, version:5.12.0

glilienfield · March 1, 2024, 12:40pm

Start the query with :auto

percyjay1998 · March 1, 2024, 1:14pm

I tried with :auto, i got a new error:

[Invalid input ':': expected
"ALTER"
"CALL"
"CREATE"
"DEALLOCATE"
"DELETE"
"DENY"
"DETACH"
"DROP"
"DRYRUN"
"ENABLE"
"FOREACH"
"GRANT"
"LOAD"
"MATCH"
"MERGE"
"OPTIONAL"
"REALLOCATE"
"REMOVE"
"RENAME"
"RETURN"
"REVOKE"
"SET"
"SHOW"
"START"
"STOP"
"TERMINATE"
"UNWIND"
"USE"
"USING"
"WITH" (line 2, column 1 (offset: 8))
":auto LOAD CSV WITH HEADERS FROM "file:///test.csv" AS row"
^]

glilienfield · March 1, 2024, 2:10pm

can you post the entire query?

percyjay1998 · March 1, 2024, 2:27pm

PROFILE
:auto LOAD CSV WITH HEADERS FROM "file:///test.csv" AS row
WITH row WHERE row.id IS NOT NUll

CALL {
WITH row
MATCH (d:Date {date:"2024-02-26"})-[:HAS_VALUE]-(id:identity {s_id:row.id})

MERGE(id)-[:ID_HAS_COUNTRY]-(c:country {country_name:row.countryName})

}IN TRANSACTIONS OF 1000 ROWS

glilienfield · March 1, 2024, 2:32pm

you need to start the query with :auto, so move profile to after :auto.

Topic		Replies	Views
Statement using Apoc Periodic Iterate gets stuck, but works without the iterate Cypher apoc , cypher	3	193	March 10, 2023
Optimizing the writing of large amounts of data in neo4j with apoc Parquet, periodic iterate Procedures & APOC apoc , performance , cypher	2	589	November 24, 2023
Apoc.periodic.iterate with apoc.export.csv.data General migrated	6	259	September 27, 2022
Why parallel:true can't be used in apoc.load.csv? Neo4j Graph Platform migrated	1	161	November 15, 2022
Apoc.periodic.iterate fails the batch if there is an duplicate data in parameter Procedures & APOC apocperiodiciterate	5	498	September 25, 2020

August Summer Fun!

Neo4j apoc.periodic.iterate

Related topics