Help with `apoc.periodic.iterate`


(Michael McKenzie) #1

I have been working to do a massive data import using in another post. As I have been trying to figure this out I have encountered the following error:

Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure `apoc.periodic.iterate`: Caused by: java.lang.ArrayIndexOutOfBoundsException: 131

Does anyone know what this might be referring to?
Below is the code that caused the error:

MATCH (url:URL)
WITH collect(url.url) AS fileURLS
UNWIND fileURLS AS fileURL
WITH fileURL
//RETURN fileURL
CALL apoc.periodic.iterate(
'
CALL apoc.load.csv($url) YIELD map AS row RETURN row
','
MERGE (state:State {id: row.STATE_CODE_001})
ON CREATE SET state.name = row.STATE_CODE_001
',
{batchSize:1000, parallel:false, params:{url:fileURL}}) YIELD batches, total
RETURN batches, total

(Stefan Armbruster) #2

does it work if you run your apoc.periodic.iterate statement on only one single file?


(Michael McKenzie) #3

Yes. For example the following code runs just fine:

CALL apoc.periodic.iterate(
'
CALL apoc.load.csv("https://docs.google.com/spreadsheets/d/1sFcY7LFBCGXSFG336UPoOf72BBv3bmv_AVaYLxwiV4A/export?format=csv&id=1sFcY7LFBCGXSFG336UPoOf72BBv3bmv_AVaYLxwiV4A&gid=1318941318") YIELD map AS row RETURN row
//LOAD CSV WITH HEADERS FROM "https://docs.google.com/spreadsheets/d/1sFcY7LFBCGXSFG336UPoOf72BBv3bmv_AVaYLxwiV4A/export?format=csv&id=1sFcY7LFBCGXSFG336UPoOf72BBv3bmv_AVaYLxwiV4A&gid=1318941318" AS row RETURN row
','
WITH CASE
	WHEN NOT row.Year IS NULL THEN collect(row.URL)
    END AS fileURLs
UNWIND fileURLs AS fileURL
CREATE (url:URL {url: fileURL})
',
{batchSize: 100, parallel:false})

(Stefan Armbruster) #4

My suspicion is there might be a bug causing apoc.periodic.iterate not to clean up things propertly. Most people using that procedure use it only once per transaction. In your code, you're calling it multiple times.

It would be cool if you could reproduce this in a minimal example and file a github ticket for it.


(Michael McKenzie) #5

If you run the snippet of code you just looked at:

CALL apoc.periodic.iterate(
'
CALL apoc.load.csv("https://docs.google.com/spreadsheets/d/1sFcY7LFBCGXSFG336UPoOf72BBv3bmv_AVaYLxwiV4A/export?format=csv&id=1sFcY7LFBCGXSFG336UPoOf72BBv3bmv_AVaYLxwiV4A&gid=1318941318") YIELD map AS row RETURN row
//LOAD CSV WITH HEADERS FROM "https://docs.google.com/spreadsheets/d/1sFcY7LFBCGXSFG336UPoOf72BBv3bmv_AVaYLxwiV4A/export?format=csv&id=1sFcY7LFBCGXSFG336UPoOf72BBv3bmv_AVaYLxwiV4A&gid=1318941318" AS row RETURN row
','
WITH CASE
	WHEN NOT row.Year IS NULL THEN collect(row.URL)
    END AS fileURLs
UNWIND fileURLs AS fileURL
CREATE (url:URL {url: fileURL})
',
{batchSize: 100, parallel:false})

this will create set of nodes containing URLs of files I want to import. This is connected to a shared google sheets file.

Then if you try running the original lines of code in this thread:

MATCH (url:URL)
WITH collect(url.url) AS fileURLS
UNWIND fileURLS AS fileURL
WITH fileURL
//RETURN fileURL
CALL apoc.periodic.iterate(
'
CALL apoc.load.csv($url) YIELD map AS row RETURN row
','
MERGE (state:State {id: row.STATE_CODE_001})
ON CREATE SET state.name = row.STATE_CODE_001
',
{batchSize:1000, parallel:false, params:{url:fileURL}}) YIELD batches, total
RETURN batches, total

you will query the nodes to try and load data from the file locations attached to each node. This should let you recreate the issue where you are.

This issue is connected to a previous post I made.

Essentially what I am trying to accomplish is to load a Google Sheets/CSV file containing the web url file locations for large sets of data I want to import.


(Michael McKenzie) #6

@stefan.armbruster I just realized that the CASE statement I use in

CALL apoc.periodic.iterate(
'
CALL apoc.load.csv("https://docs.google.com/spreadsheets/d/1sFcY7LFBCGXSFG336UPoOf72BBv3bmv_AVaYLxwiV4A/export?format=csv&id=1sFcY7LFBCGXSFG336UPoOf72BBv3bmv_AVaYLxwiV4A&gid=1318941318") YIELD map AS row RETURN row
//LOAD CSV WITH HEADERS FROM "https://docs.google.com/spreadsheets/d/1sFcY7LFBCGXSFG336UPoOf72BBv3bmv_AVaYLxwiV4A/export?format=csv&id=1sFcY7LFBCGXSFG336UPoOf72BBv3bmv_AVaYLxwiV4A&gid=1318941318" AS row RETURN row
','
WITH CASE
	WHEN NOT row.Year IS NULL THEN collect(row.URL)
    END AS fileURLs
UNWIND fileURLs AS fileURL
CREATE (url:URL {url: fileURL})
',
{batchSize: 100, parallel:false})

is not working as intended and is likely the culprit of the issue. The Google Sheet that has the list of URLs has NULL because I still have to validate the URL before loading the data. My goal was to ignore this rows with the WITH CASE, but that isn't happening.

However, once I add `WHERE NOT url.url CONTAINS "nbi//" to avoid that error, I end back up with the initial error

Neo.ClientError.Procedure.ProcedureCallFailed: Failed to invoke procedure `apoc.periodic.iterate`: Caused by: java.lang.ArrayIndexOutOfBoundsException: 131

(Michael Hunger) #7

it's an issue with apoc.load.csv and your data, we'll have to look into that. see the GH issue.


(Michael McKenzie) #8

@michael.hunger Thanks for identifying what was causing the error. Thus, the solution to my original question is:

MATCH (url:URL)
WITH collect(url.url) AS fileURLS
UNWIND fileURLS AS fileURL
WITH fileURL
//RETURN fileURL
CALL apoc.periodic.iterate(
'
LOAD CSV WITH HEADERS FROM $url AS row RETURN row
','
MERGE (state:State {id: row.STATE_CODE_001})
ON CREATE SET state.name = row.STATE_CODE_001
',
{batchSize:1000, parallel:false, params:{url:fileURL}}) YIELD batches, total
RETURN batches, total

Until the issue that is throwing the error from the apoc.load.csv is corrected I have replaced:

CALL apoc.load.csv($url) YIELD map AS row RETURN row

with

LOAD CSV WITH HEADERS FROM $url AS row RETURN row