LOAD CSV taking time

pragyasood28 · September 23, 2021, 2:26pm

Hi Everyone,

I am loading a csv file to create nodes in neo4j, when I tried with a csv of around 1,000 rows it took 1 second for the nodes to be created, when I increased my dataset to 3,000 rows it is taking 15 seconds.

Can someone please suggest, how to reduce this time and why is this difference coming for 3,000 rows ?

What is the best way to create a graph using csv with large dataset ?

Below is the query that I use to create my graph nodes and set the properties:


load csv with headers from 'file:///storage.csv' as line
merge(a:Storage{name:line.code+" "+date(line.Date).month+"-"+date(line.Date).year+" "+line.Product}) on create set a.Incoming_Stock=toFloat(line.incoming_stock),a.Opening_Inv_Physical=toFloat(line.opening_inventory_physical),a.Target_Closing_Inv=toFloat(line.target_closing_inventory),a.Outflow_Requirement=toFloat(line.outflow_requirement), a.date=date(line.Date), a.Product=line.Product,a.Node=line.code;

Thanks

Cobra · September 23, 2021, 2:34pm

Hello @pragyasood28

Did you create a UNIQUE CONSTRAINT on a node property?

Regards,
Cobra

pragyasood28 · September 23, 2021, 2:41pm

Hi,

No I haven't created a UNIQUE CONSTRAINT, but the property name which I am creating for my nodes will always be unique as that's how I create my input csv. So I know that multiple nodes will not be created.

Just now I tried to create Unique Constraint on my Storage Node, after importing the csv and creating the node. But how will this reduce the time taken to load csv ?

Its taking 15 seconds just to create 3000 nodes in my graph.

Thanks,
Pragya

Cobra · September 23, 2021, 3:02pm

Have a look here.

pragyasood28 · September 24, 2021, 10:06am

Hi,
My nodes are now getting created within milliseconds as I changed my cypher query from merge to create, but my relationship is taking around 60 seconds. Any suggestions on how to increase the speed of relationship creation from csv.

Below is the code that I have used:


load csv with headers from 'file:///transport_laporte_db10.csv' as line 
match(sender:Storage{name:line.sender_node+" "+date(line.sender_date).month+"-"+date(line.sender_date).year+" "+line.Product})
match(receiver:Storage{name:line.receiver_node+" "+date(line.receiver_date).month+"-"+date(line.receiver_date).year+" "+line.Product})
merge(sender)-[rel:transport{mode:line.mode,lead_time:toInteger(line.lead_time), quota:toInteger(line.quota)}]->(receiver);

Thanks

pragyasood28 · September 24, 2021, 5:36pm

Creating an index helped me to reduce the time in relationship creation

andrew_bowman · September 25, 2021, 12:54am

Just some context for this...

When there is no index present, then, per row, Cypher will do a label scan for every single :Storage node, performing property access to see if the node exists.

So if you have 10000 :Storage nodes in the database, and 3000 rows in the CSV, then it will be performing 3000 label scans, meaning that it will ultimately be doing 3000 * 10000 = 30000000 node comparisons. So the speed of loading becomes linearly proportional to the number of nodes with the given label * the number of rows in your CSV, and that's only considering a single MERGE. If there are multiple MERGEs on nodes that aren't index-backed, then the problem compounds.

By contrast, when there is an index in place, then there will be only one index lookup performed per row, so 3000 index lookups, which are quite quick.

Topic		Replies	Views
Load CSV taking excessive amount of time Neo4j Graph Platform	2	533	February 22, 2023
Slow load_csv Cypher cypher	4	2039	July 29, 2019
Fastest way to load data in neo4j using python Cypher	5	9774	May 5, 2021
CSV import issue Import / Export	26	711	June 21, 2023
Importing relationships from multiple csv file Import / Export performance , load-csv	12	3194	June 5, 2020

Get Certified in June!

LOAD CSV taking time

Related topics