APOC Periodic Iterate with APOC Load JDBC To Load Data in Batches

carib · August 31, 2021, 6:28pm

Hello,

What is the best way to import 2,000,000 rows of data into Neo4j from a postgresql database? By following the docs on apoc.load.jdbc, I managed to install the postgresql driver and to query the postgresql by yielding as rows in the Neo4j browser the data that I need.

But I'm confused in this part of the docs here to load data in batches:

Does this mean that the first statement in apoc.periodic.iterate ('CALL apoc.load.jdbc("jdbc:mysql://localhost:3306/northwind?user=root","company")') runs first and then that entire data is partitioned into batches of 10000 rows and then loaded using the second statement (CREATE (p:Person) SET p += value)?

If my understanding is correct, does that mean that if my sql query to be passed to the apoc.load.jdbc is something like:

SELECT dt, employee_id, department_id
FROM my_table
WHERE dt > now() - INTERVAL \'24 hours\'
ORDER BY dt DESC, employee_id

and this returns 2,000,000 rows, then apoc.periodic.iterate will take the results of this query and load them to Neo4j in batchSize of 10,000 at a time?

What would be a faster way to load data from postgresql to Neo4j?

Thanks

sameer.gijare14 · September 4, 2021, 12:28pm

Hello

I think the way could be to export the data from postgre into CSV files with 20000 rows in each CSV and load them periodically into Neo4j through cron scheduler.

Thanking you
Yours faithfully
Sameer Gijare

carib · September 9, 2021, 9:57pm

Hi Sameer,

Thank you for your input. If I export csv files with 20,000 rows in each, we're talking about 100 files. There are days when my sql query returns 5,000,000 rows so breaking up the output from postgres into multiple csv files may not scale over time.

What if I do batch transactions with apoc.periodic.iterate?

sameer.gijare14 · September 13, 2021, 2:41pm

You can write your own script to read input csvs whether large or small. It could be small java program that polls a particular folder periodically, connects to graph DB ingests the data. You can if it is required to transform the data in your domain business objects. Later on once tested you can install it as a plug in in neo4j with configuration file for tuning dynamic parameters.
Thanking you
Sameer Sudhir G

Topic		Replies	Views
Preform batch data synchronization between a postgreSQL DB and a neo4j DB Import / Export	2	431	April 17, 2023
Struggling with apoc.periodic.iterate to load data from RDBMS to Neo4j Procedures & APOC	2	515	November 4, 2020
Rapid data ingest : 1 min intervals Import / Export	5	834	March 29, 2019
How to batch json records using apoc library for better importing? Procedures & APOC apoc , performance , cypher , import , json , batching	4	1345	October 16, 2021
Load pandas data frame to neo4j database in batches Import / Export	1	979	September 25, 2023

July Summer Fun!

APOC Periodic Iterate with APOC Load JDBC To Load Data in Batches

Related topics