Apoc.periodic.iterate is never ending

1996haribaskar · April 20, 2023, 11:37am

I am using the apoc.periodic.iterate procedure to upload data from ORA DB into the Neo4j and trying to create relationships. Initially I created the nodes and now trying to match those existing nodes and creating relationships.

For example, I have a two type of nodes (:Library) and (:Books). These nodes are created initially and now I am trying to create the relationship between the Library and books by matching the existing nodes.

CALL apoc.periodic.iterate(
'CALL apoc.load.jdbc("jdbc:oracle:thin:connection_string","select * from library_books") YIELD row',
'MATCH (a:Library),
(b:Books)
where b.book_name=row.BOOK_NAME
and a.lib_id=row.LIB_ID
create (a)-[y:HAS_BOOKS]->(b)
SET y.lib_book_code=row.CODE',
{ batchSize:10000, parallel:true})

Even when I run this code for 500 nodes, its never ending. can someone please help me with this query or How can I write this query better?

Thank you!

glilienfield · April 20, 2023, 12:37pm

What happens when you execute just the database call in the browser, do is it return the data and complete in a reasonable amount of time?

CALL apoc.load.jdbc("jdbc:oracle:thin:connection_string","select * from library_books") YIELD row'

Do you have indexes on the two properties you are matching on?

Books(book_name) and Library(lib_id)

Have you tried with parallel set to false? I imagine the same library node could be used concurrently I multiple batches, so maybe you are experiencing some blocking.

1996haribaskar · April 20, 2023, 12:46pm

Yes, when I just try to run the database call, I get the data within seconds.
I dont have any indexing on the property. May be I can add the index and try it again
Yes, I tried setting the parallel to false but still the same

1996haribaskar · April 20, 2023, 2:28pm

I created the index for the properties that I am using to match and the query keeps on running for more than an hour now for 300 nodes.

glilienfield · April 20, 2023, 4:36pm

Try this to determine if it is caused by nesting the db call within the iterate. I assume you don't have 100,000s of rows to collect. If you do, you could limit the number of rows for debugging purposes by inserting 'WITH row LIMIT 1000' after the DB call, before the collect operation.

CALL apoc.load.jdbc("jdbc:oracle:thin:connection_string","select * from library_books") YIELD row
WITH collect(row) as data
CALL apoc.periodic.iterate(
'UNWIND data as row RETURN row',
'MATCH (a:Library),
(b:Books)
where b.book_name=row.BOOK_NAME
and a.lib_id=row.LIB_ID
create (a)-[y:HAS_BOOKS]->(b)
SET y.lib_book_code=row.CODE',
{ batchSize:10000, parallel:true, params:{data:data}})

1996haribaskar · April 24, 2023, 12:12pm

Thankyou so much Gary!!

It worked and as you said its because nesting the db call within the iterate and I have around more than a million rows

Topic		Replies	Views
Optimizing the writing of large amounts of data in neo4j with apoc Parquet, periodic iterate Procedures & APOC apoc , performance , cypher	2	583	November 24, 2023
Java apoc.periodic.iterate( {params:{...}}) insert large scale of data Procedures & APOC apoc , cypher , java-api , java-driver	0	627	November 16, 2020
Apoc.periodic.iterate only writing one batch with parallel Procedures & APOC	4	758	July 29, 2020
Struggling with apoc.periodic.iterate in a big Query from python code Cypher apoc , cypher , apocperiodiciterate	12	5284	May 8, 2019
Importing Relationships in Parallel is slow Procedures & APOC apoc	2	363	September 13, 2023

July Summer Fun!

Apoc.periodic.iterate is never ending

Related topics