I am using the apoc.periodic.iterate procedure to upload data from ORA DB into the Neo4j and trying to create relationships. Initially I created the nodes and now trying to match those existing nodes and creating relationships.
For example, I have a two type of nodes (:Library) and (:Books). These nodes are created initially and now I am trying to create the relationship between the Library and books by matching the existing nodes.
CALL apoc.periodic.iterate(
'CALL apoc.load.jdbc("jdbc:oracle:thin:connection_string","select * from library_books") YIELD row',
'MATCH (a:Library),
(b:Books)
where b.book_name=row.BOOK_NAME
and a.lib_id=row.LIB_ID
create (a)-[y:HAS_BOOKS]->(b)
SET y.lib_book_code=row.CODE',
{ batchSize:10000, parallel:true})
Even when I run this code for 500 nodes, its never ending. can someone please help me with this query or How can I write this query better?
Thank you!
What happens when you execute just the database call in the browser, do is it return the data and complete in a reasonable amount of time?
CALL apoc.load.jdbc("jdbc:oracle:thin:connection_string","select * from library_books") YIELD row'
Do you have indexes on the two properties you are matching on?
Books(book_name) and Library(lib_id)
Have you tried with parallel set to false? I imagine the same library node could be used concurrently I multiple batches, so maybe you are experiencing some blocking.
I created the index for the properties that I am using to match and the query keeps on running for more than an hour now for 300 nodes.
Try this to determine if it is caused by nesting the db call within the iterate. I assume you don't have 100,000s of rows to collect. If you do, you could limit the number of rows for debugging purposes by inserting 'WITH row LIMIT 1000' after the DB call, before the collect operation.
CALL apoc.load.jdbc("jdbc:oracle:thin:connection_string","select * from library_books") YIELD row
WITH collect(row) as data
CALL apoc.periodic.iterate(
'UNWIND data as row RETURN row',
'MATCH (a:Library),
(b:Books)
where b.book_name=row.BOOK_NAME
and a.lib_id=row.LIB_ID
create (a)-[y:HAS_BOOKS]->(b)
SET y.lib_book_code=row.CODE',
{ batchSize:10000, parallel:true, params:{data:data}})
Thankyou so much Gary!!
It worked and as you said its because nesting the db call within the iterate and I have around more than a million rows
1 Like