Apoc.periodic.iterate leading to java.lang.StackOverflowError

DanieleBaisero · June 7, 2022, 10:49am

Hello!

I'h hoping someone has any experience with this, as there's several similar threads but none of the solutions seem to work for me.

I'm trying to batch-create a series of relationships in a db with pre-existing nodes.
I have a set of csv.gz files that I am processing individually, the largest of which has details for ~60_000_000 relations.

The following query leads to a java.lang.StackOverflowError.
The reason for the relationship MERGE is that Species have separate files for breeding and non-breeding traits

CALL apoc.periodic.iterate(
  "CALL apoc.load.csv('/path/to/file/102998555_breeding.csv.gz') yield map as row return row",
  'MATCH (c:Cell{cellid:toInteger(row.sid)})
   MATCH (s:Species{taxid:toInteger(102998555)})
   MERGE (c)-[r:CONTAINS]->(s)
   SET
    r.breeding_range = toFloat(row.range),
    r.breeding_aoh = toFloat(row.aoh)',
  {batchSize:100000, iterateList:true, parallel:true}
  );

The query fails even if I set batchSize:100 and parallel:false.

System details:

neo4j-desktop-1.4.15-x86_64.AppImage
database version 4.4.7
apoc version 4.4.0.5
dbms.memory.heap.initial_size=4G
dbms.memory.heap.max_size=4G
dbms.memory.pagecache.size=1G
dbms.tx_state.memory_allocation=ON_HEAP

Thank you!

glilienfield · June 7, 2022, 12:53pm

From what I read, the periodic iterate calls the first statement to get the data, then calls the second statement for each result from the first statement. I suspect the first operation is demanding a huge amount of memory to collect the result of the import of your huge files, so it can be batched and processed by the second operation. One way to indirectly confirm the issue is in the first stage is to change the second statement to something like ‘return 1’ and see if you still get a stack overflow error.

If the above is true, the two-phased approach of periodic iterate seems problematic for your use case. Maybe try cypher’s ‘call subquery in transaction’ instead.

DanieleBaisero · June 7, 2022, 2:56pm

I tried editing the apoc.periodic.iterate call with 'RETURN 1' as the second parameter, but get the same results.

I also tried the following query, pulling MATCH (s:Species) out of the CALL section in order to recycle it. But I get another error.

Neo.DatabaseError.Statement.ExecutionFailed null

:auto LOAD CSV WITH HEADERS FROM "file:///path/to/file/102998555_breeding.csv.gz" AS row
MATCH (s:Species{taxid:toInteger(102998555)})
CALL {
  WITH s, row
  MATCH (c:Cell{cellid:toInteger(row.sid)})
  MERGE (c)-[r:CONTAINS]->(s)
  SET
    r.breeding_range = toFloat(row.range),
    r.breeding_aoh   = toFloat(row.aoh)
  } IN TRANSACTIONS OF 1000 ROWS;

Screenshot from 2022-06-07 15-54-40.png

glilienfield · June 7, 2022, 3:33pm

I don't think cypher's LOAD CSV supports compressed files. Your file path ends with '.gz' extension. The apoc.load.csv apparently supports compressed files.

Also, maybe move the the first MATCH to before the LOAD, so it is not repeated for each row.

DanieleBaisero · June 7, 2022, 3:56pm

LOAD CSV works with .gz files, as many smaller files were imported succesfully.
That said, I suspect that transaction processes don't like large csv.gz files, as the file in question was imported very quickly once I un-gzipped it before running the import.

I wouldn't call this strictly solved, as the issue is still there.

glilienfield · June 7, 2022, 4:07pm

Are you stating you were able to get the large file imported as needed once the file was uncompressed first?

DanieleBaisero · June 10, 2022, 10:01am

Yes, that is correct.

Both apoc and LOAD CSV work with large csv files and small csv.gz files. Both fail with large csv.gz files.

Topic		Replies	Views
StackOverflowError on cypher-shell Procedures & APOC apoc , cypher	6	1082	January 15, 2019
Apoc.periodic.iterate fail in large data (640 million nodes) Neo4j Graph Platform migrated	0	162	January 17, 2023
Apoc.periodic.iterate is never ending Procedures & APOC apoc , cypher , relationship , import	5	365	April 24, 2023
Why does apoc.periodic.iterate keep creating new relationships for multiple running? Neo4j Graph Platform apoc , import	6	618	October 9, 2021
Neo4j apoc.periodic.iterate Import / Export	9	281	March 1, 2024

Get Certified in June!

Apoc.periodic.iterate leading to java.lang.StackOverflowError

Related topics