LOAD CSV long loading times

IFC_modeller · August 16, 2021, 9:02am

I know this question has been asked many time, but from those threads I didn't get any straight solutions that would solve my loading times

I am pretty sure my loading times are taking too long, and I am not sure is it the code or the software or the hardware. So far my loading times are in minutes for LOAD CSV, while the admin import takes few seconds.

E.g. for a CSV file of 14k lines I am getting following times:

Added 14019 labels, created 14019 nodes, set 28037 properties, created 14018 relationships, completed after 183360 ms

The file contains date/time and value measurements in a format:

2019-07-10T19:38:00.062000|51.744617
2019-07-10T19:39:00.065000|52.153733
2019-07-10T19:40:00.066000|51.226583
2019-07-10T19:41:00.069000|51.341583
2019-07-10T19:42:00.070000|51.524967

My code is

LOAD CSV WITH HEADERS FROM 'file:///data.csv' AS row
FIELDTERMINATOR '|'
MERGE (d:Data{time:datetime(row.Time),value:row.Value})
MERGE (s:Sensor{Name:"Test"})   
MERGE (s)-[:HAS_DATA]->(d)

I am using 1.4.5 desktop version and 4.2.5 database version

Bennu · August 16, 2021, 9:30am

Hi @IFC_modeller!

Slow loading times may indicate that your MERGE is using too much time to decide if your potential node is already in the DB. In this case the, :Data variability could be an issue. If you have 14k nodes created it means that every merge turns out to be a CREATE so in that case just use it instead. Otherwise, you may like a different usage of the data on your model.

Lastly but most important. Use Constraints on your DB if you are planning to use MERGE and use it as property selector, then split your logic on ON MATCH or ON CREATE.

Hoping to be useful.

H

Benoit_d · August 16, 2021, 12:03pm

Hi @IFC_modeller,

actually I presume your sensor "Test" only exist one time. You should create it separately and call it in the query with a "match " instead of a "merge".

The second point, as @Bennu said, a "merge" is to be used when you want to avoid that a node or a relation is created twice. If a Data-node exist twice, how is it to handle with it? Always take the firsrt one? According to the given information it is not to recognize, that it is mandatory to have a data recording twice as they are all bind to the sensor. It could be a strategy to build them with a create and then if some data recording appears twice to destroy one of them.

LOAD CSV WITH HEADERS FROM 'file:///data.csv' AS row
FIELDTERMINATOR '|'
MATCH (s:Sensor{Name:"Test"})   
CREATE (s)-[:HAS_DATA]->(d:Data{time:datetime(row.Time),value:row.Value})

IFC_modeller · August 16, 2021, 1:03pm

Hi @Benoit_d
Thank you, by using create instead of merge, the import only took a second.

IFC_modeller · August 16, 2021, 1:04pm

Thank you for your help, seems like I was overusing merge, by taking some shortcuts, instead of doing it correctly.

Topic		Replies	Views
LOAD CSV taking time Import / Export cypher , import	6	703	September 25, 2021
Load CSV taking excessive amount of time Neo4j Graph Platform	2	534	February 22, 2023
hello guys, I have a problem when I want to load this query, it takes a very long time, up to 2 hours, does anyone know why? Cypher performance , cypher , operations , import	2	90	June 6, 2024
Incredibly Long Load CSV Using Desktop Import / Export	1	208	March 20, 2023
Best way to load large CSV files and link the created nodes into a chain of events Import / Export performance , cypher , import	9	480	December 22, 2021

July Summer Fun!

LOAD CSV long loading times

Related topics