Load large CSV with LOAD CSV or python

Hi, I'm Song.

I'm trying to import my data to neo4j.
I executed with "LOAD CSV" clause, imported 70 % of the whole data. So i run two times more the same cypher,
(It was MERGE) but nothing changed.

Actually my data is LARGE, about 2 billion, I found the neo4j-admin too, but it used to unused database. .
(I already put other data in my DB)

So i tested importing python with small data, about 1500 rows, it works but too slow.
It import 500 rows in 5mins. :cry:
Can python import data to neo4j in batch ?

If you know of a good way, please help me

I'm definitely not an expert on this at all but a couple thoughts:

  1. Have you set unique constraints for each node label on a property like the id? If not this will speed up you import time dramatically. See: Examples - Cypher Manual

  2. You should run the LOAD CSV script in batches 5-10000 rows at a time.

  3. billions of nodes I believe is too large for LOAD CSV but you should be able to handle several million.

Hope that's helpful.

You are using the python driver, so you can create batches of any size that you want per transaction. Create a transaction function to receive a list of rows of data through parameters. The query within the transaction function will process each row using 'unwind' on the list passed as a parameter. Call executeWrite iteratively with your transaction function until all your batches are exhausted.

As @iuviene stated, make sure you have an index on the property you are using to match your nodes, as a merge requires a match each time.