How to make this load csv go faster?

skmami · May 13, 2020, 9:31pm

Greetings,

I have the following query and my file contains about 30 million records. Is there a way to make this run faster ?

It has been running for well over 40 minutes and still running.

CALL apoc.periodic.iterate('
LOAD CSV WITH HEADERS FROM
"file:///home/pdss01/satish/intl/part-00004-4df038b2-3d1d-47ba-ad35-f391e09d7306-c000.csv" AS line return line.FAREID as fareid, toInteger(line.TARIFF_NBR) as tariff ','
match (f:Fare {ID: fareid})
match (ft:FareTariff {name: tariff})
CREATE (f)-[fft:fare_to_faretariff]->(ft)
',{batchSize:1, iterateList:true, parallel:true})

There are fewer tariff numbers than fares. I am afraid that I might get dead lock errors if I use a bigger batchSize.

Thanks

koji · May 14, 2020, 1:52am

Hi,

I think it would be faster if you created an index before CALL apoc.periodic.iterate.

for 4.x

CREATE INDEX id FOR (n:Fare) ON (n.ID);
CREATE INDEX name FOR (n:FareTariff) ON (n.name);

for 3.x

CREATE INDEX ON :Fare(ID);
CREATE INDEX ON :FareTariff(name);

skmami · May 14, 2020, 2:23am

Thanks @koji. I have created constraints on both nodes. Wouldn't that be enough ? I thought constraints created an index.

CREATE CONSTRAINT ON (f:FareTariff) ASSERT f.name IS UNIQUE;
CREATE CONSTRAINT ON (f:FareBasis) ASSERT f.name IS UNIQUE;

Also I verified with call db.constraints(); that my constraints are created properly:

"constraint_9fff29c0"	"CONSTRAINT ON ( faretariff:FareTariff ) ASSERT (faretariff.name) IS UNIQUE"

"constraint_f599caff"	"CONSTRAINT ON ( fare:Fare ) ASSERT (fare.ID) IS UNIQUE"

is there any other way to check what is causing this to go so slow. ?

Thanks again for your help.

intouch_vivek · May 14, 2020, 8:53am

Hi Satish,

Avoid to have parallel:true for complex executions

Also why you have mentioned batchsize as 1, it's value should be based on data size you are trying to process at a time. Default value is 10000.
https://neo4j.com/docs/labs/apoc/current/graph-updates/periodic-execution/

Could you please try below
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM
"file:///home/pdss01/satish/intl/part-00004-4df038b2-3d1d-47ba-ad35-f391e09d7306-c000.csv" AS line
match (f:Fare {ID: line.FAREID})
match (ft:FareTariff {name: toInteger(line.TARIFF_NBR)})
CREATE (f)-[fft:fare_to_faretariff]->(ft)

koji · May 14, 2020, 10:57pm

It's enough.
CONSTRAINT ON creates these index.

skmami · May 15, 2020, 3:17am

For some reason it is very very slow. I am now looking into import tool. Hopefully that works.

Topic		Replies	Views
How to make load csv go faster? Cypher	17	3940	November 20, 2023
How to make load csv go faster? Neo4j Graph Platform migrated , cypher-tagged	2	185	December 23, 2022
APOC Function apoc.load.csv() Slowing Down After Importing ~20 Million Nodes Procedures & APOC	10	474	July 6, 2021
Bulk Import Taking too long Import / Export	3	537	June 1, 2020
How can I improve the performance of this query? Newbie Questions	5	1386	April 4, 2019

How to make this load csv go faster?

Related topics