Help with `apoc.periodic.iterate`

mckenzma · March 13, 2019, 1:26am

I have been working to do a massive data import using in another post. As I have been trying to figure this out I have encountered the following error:

Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure `apoc.periodic.iterate`: Caused by: java.lang.ArrayIndexOutOfBoundsException: 131

Does anyone know what this might be referring to?
Below is the code that caused the error:

MATCH (url:URL)
WITH collect(url.url) AS fileURLS
UNWIND fileURLS AS fileURL
WITH fileURL
//RETURN fileURL
CALL apoc.periodic.iterate(
'
CALL apoc.load.csv($url) YIELD map AS row RETURN row
','
MERGE (state:State {id: row.STATE_CODE_001})
ON CREATE SET state.name = row.STATE_CODE_001
',
{batchSize:1000, parallel:false, params:{url:fileURL}}) YIELD batches, total
RETURN batches, total

stefan.armbruster · March 13, 2019, 7:56pm

does it work if you run your apoc.periodic.iterate statement on only one single file?

mckenzma · March 13, 2019, 8:03pm

Yes. For example the following code runs just fine:

CALL apoc.periodic.iterate(
'
CALL apoc.load.csv("https://docs.google.com/spreadsheets/d/1sFcY7LFBCGXSFG336UPoOf72BBv3bmv_AVaYLxwiV4A/export?format=csv&id=1sFcY7LFBCGXSFG336UPoOf72BBv3bmv_AVaYLxwiV4A&gid=1318941318") YIELD map AS row RETURN row
//LOAD CSV WITH HEADERS FROM "https://docs.google.com/spreadsheets/d/1sFcY7LFBCGXSFG336UPoOf72BBv3bmv_AVaYLxwiV4A/export?format=csv&id=1sFcY7LFBCGXSFG336UPoOf72BBv3bmv_AVaYLxwiV4A&gid=1318941318" AS row RETURN row
','
WITH CASE
	WHEN NOT row.Year IS NULL THEN collect(row.URL)
    END AS fileURLs
UNWIND fileURLs AS fileURL
CREATE (url:URL {url: fileURL})
',
{batchSize: 100, parallel:false})

stefan.armbruster · March 13, 2019, 8:10pm

My suspicion is there might be a bug causing apoc.periodic.iterate not to clean up things propertly. Most people using that procedure use it only once per transaction. In your code, you're calling it multiple times.

It would be cool if you could reproduce this in a minimal example and file a github ticket for it.

mckenzma · March 13, 2019, 8:17pm

If you run the snippet of code you just looked at:

CALL apoc.periodic.iterate(
'
CALL apoc.load.csv("https://docs.google.com/spreadsheets/d/1sFcY7LFBCGXSFG336UPoOf72BBv3bmv_AVaYLxwiV4A/export?format=csv&id=1sFcY7LFBCGXSFG336UPoOf72BBv3bmv_AVaYLxwiV4A&gid=1318941318") YIELD map AS row RETURN row
//LOAD CSV WITH HEADERS FROM "https://docs.google.com/spreadsheets/d/1sFcY7LFBCGXSFG336UPoOf72BBv3bmv_AVaYLxwiV4A/export?format=csv&id=1sFcY7LFBCGXSFG336UPoOf72BBv3bmv_AVaYLxwiV4A&gid=1318941318" AS row RETURN row
','
WITH CASE
	WHEN NOT row.Year IS NULL THEN collect(row.URL)
    END AS fileURLs
UNWIND fileURLs AS fileURL
CREATE (url:URL {url: fileURL})
',
{batchSize: 100, parallel:false})

this will create set of nodes containing URLs of files I want to import. This is connected to a shared google sheets file.

Then if you try running the original lines of code in this thread:

MATCH (url:URL)
WITH collect(url.url) AS fileURLS
UNWIND fileURLS AS fileURL
WITH fileURL
//RETURN fileURL
CALL apoc.periodic.iterate(
'
CALL apoc.load.csv($url) YIELD map AS row RETURN row
','
MERGE (state:State {id: row.STATE_CODE_001})
ON CREATE SET state.name = row.STATE_CODE_001
',
{batchSize:1000, parallel:false, params:{url:fileURL}}) YIELD batches, total
RETURN batches, total

you will query the nodes to try and load data from the file locations attached to each node. This should let you recreate the issue where you are.

This issue is connected to a previous post I made.

Essentially what I am trying to accomplish is to load a Google Sheets/CSV file containing the web url file locations for large sets of data I want to import.

mckenzma · March 13, 2019, 8:27pm

@stefan.armbruster I just realized that the CASE statement I use in

CALL apoc.periodic.iterate(
'
CALL apoc.load.csv("https://docs.google.com/spreadsheets/d/1sFcY7LFBCGXSFG336UPoOf72BBv3bmv_AVaYLxwiV4A/export?format=csv&id=1sFcY7LFBCGXSFG336UPoOf72BBv3bmv_AVaYLxwiV4A&gid=1318941318") YIELD map AS row RETURN row
//LOAD CSV WITH HEADERS FROM "https://docs.google.com/spreadsheets/d/1sFcY7LFBCGXSFG336UPoOf72BBv3bmv_AVaYLxwiV4A/export?format=csv&id=1sFcY7LFBCGXSFG336UPoOf72BBv3bmv_AVaYLxwiV4A&gid=1318941318" AS row RETURN row
','
WITH CASE
	WHEN NOT row.Year IS NULL THEN collect(row.URL)
    END AS fileURLs
UNWIND fileURLs AS fileURL
CREATE (url:URL {url: fileURL})
',
{batchSize: 100, parallel:false})

is not working as intended and is likely the culprit of the issue. The Google Sheet that has the list of URLs has NULL because I still have to validate the URL before loading the data. My goal was to ignore this rows with the WITH CASE, but that isn't happening.

However, once I add `WHERE NOT url.url CONTAINS "nbi//" to avoid that error, I end back up with the initial error

Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure `apoc.periodic.iterate`: Caused by: java.lang.ArrayIndexOutOfBoundsException: 131

michael.hunger · March 14, 2019, 12:03pm

it's an issue with apoc.load.csv and your data, we'll have to look into that. see the GH issue.

mckenzma · March 14, 2019, 3:16pm

@michael.hunger Thanks for identifying what was causing the error. Thus, the solution to my original question is:

MATCH (url:URL)
WITH collect(url.url) AS fileURLS
UNWIND fileURLS AS fileURL
WITH fileURL
//RETURN fileURL
CALL apoc.periodic.iterate(
'
LOAD CSV WITH HEADERS FROM $url AS row RETURN row
','
MERGE (state:State {id: row.STATE_CODE_001})
ON CREATE SET state.name = row.STATE_CODE_001
',
{batchSize:1000, parallel:false, params:{url:fileURL}}) YIELD batches, total
RETURN batches, total

Until the issue that is throwing the error from the apoc.load.csv is corrected I have replaced:

CALL apoc.load.csv($url) YIELD map AS row RETURN row

with

LOAD CSV WITH HEADERS FROM $url AS row RETURN row

Topic		Replies	Views
Is apoc.periodic.iterate not releasing memory and causing a websocket error? Procedures & APOC halin	5	1597	August 12, 2019
Using Nested `LOAD CSV` with `apoc.periodic.iterate` Procedures & APOC	5	4244	August 12, 2019
There is no procedure with the name `apoc.periodic.iterate` Procedures & APOC apoc	1	483	December 6, 2023
Nested apoc.periodic.iterate - error Neo4j Graph Platform apoc	3	767	October 23, 2019
Failure when calling apoc.periodic.iterate multiple times Procedures & APOC	4	1315	June 4, 2019

Help with `apoc.periodic.iterate`

Related topics