ETL for loading csv

samik.mukherjee · July 10, 2020, 2:23pm

Hi ,
I am trying to load a CSV file for relationships. I am using CSV load and it is very slow. I do not want to use import , because that requires a clean database and I already have data present. I am wondering if NEO4J ETL would be faster ?
I am running the following code as of now through the NEO4J browser

:auto USING PERIODIC COMMIT 10000 LOAD CSV WITH HEADERS FROM "file:///ACTIVE_INGREDIENTS_BY_COUNTRY_COUNTRY_MAPPING.csv" AS row
MATCH (ac:ActiveIngredientsByCountry {ACTIVE_INGREDIENT_BY_COUNTRY_ID: row.ACTIVE_INGREDIENT_BY_COUNTRY_ID}),(c:Country {COUNTRY_CODE : row.COUNTRY_CODE} )
create (ac)-[:ACTIVE_INGREDIENT_BY_COUNTRY_COUNTRY_ASSOCIATION]->(c)

It creates relationships between two kinds of entities ACTIVE_INGREDIENTS_BY_COUNTRY (contains about 400k nodes)
COUNTRY (contains about 3 nodes)

This query takes about 3 days and we need to make it faster. How can I do this in a database in which data is already present ?

Thanks,
Samik

webtic · July 11, 2020, 3:40pm

What I would do is create a script which reads the CSV, does any pre-processing needed and apply the Cypher to the database. Personally I would grab Python because I am proficient in it.

Without seeing the actual data it is always a guess but from what you sketch I would not be surprised if you could optimise it in something which takes hours instead of days.

Topic		Replies	Views
I want to understand the logic for neo4j-etl cli export ETL-Tool	2	807	February 7, 2020
Load-CSV very slow with millions of nodes Import / Export load-csv , import , neo4j-import , csv , neo4j	10	11560	April 7, 2022
Fastest way to load data in neo4j using python Cypher	5	9868	May 5, 2021
CSV import issue Import / Export	26	715	June 21, 2023
Do I need to load the CSV files twice for creating relationships? Neo4j Graph Platform	11	1639	April 23, 2020

July Summer Fun!

ETL for loading csv

Related topics