What is the best way to import 2,000,000 rows of data into Neo4j from a postgresql database? By following the docs on apoc.load.jdbc, I managed to install the postgresql driver and to query the postgresql by yielding as rows in the Neo4j browser the data that I need.
But I'm confused in this part of the docs here to load data in batches:
Does this mean that the first statement in apoc.periodic.iterate ('CALL apoc.load.jdbc("jdbc:mysql://localhost:3306/northwind?user=root","company")') runs first and then that entire data is partitioned into batches of 10000 rows and then loaded using the second statement (CREATE (p:Person) SET p += value)?
If my understanding is correct, does that mean that if my sql query to be passed to the apoc.load.jdbc is something like:
SELECT dt, employee_id, department_id FROM my_table WHERE dt > now() - INTERVAL \'24 hours\' ORDER BY dt DESC, employee_id
and this returns 2,000,000 rows, then apoc.periodic.iterate will take the results of this query and load them to Neo4j in batchSize of 10,000 at a time?
What would be a faster way to load data from postgresql to Neo4j?
Thank you for your input. If I export csv files with 20,000 rows in each, we're talking about 100 files. There are days when my sql query returns 5,000,000 rows so breaking up the output from postgres into multiple csv files may not scale over time.
What if I do batch transactions with apoc.periodic.iterate?
You can write your own script to read input csvs whether large or small. It could be small java program that polls a particular folder periodically, connects to graph DB ingests the data. You can if it is required to transform the data in your domain business objects. Later on once tested you can install it as a plug in in neo4j with configuration file for tuning dynamic parameters.
Sameer Sudhir G