I have the following query and my file contains about 30 million records. Is there a way to make this run faster ?
It has been running for well over 40 minutes and still running.
CALL apoc.periodic.iterate('
LOAD CSV WITH HEADERS FROM
"file:///home/pdss01/satish/intl/part-00004-4df038b2-3d1d-47ba-ad35-f391e09d7306-c000.csv" AS line return line.FAREID as fareid, toInteger(line.TARIFF_NBR) as tariff ','
match (f:Fare {ID: fareid})
match (ft:FareTariff {name: tariff})
CREATE (f)-[fft:fare_to_faretariff]->(ft)
',{batchSize:1, iterateList:true, parallel:true})
There are fewer tariff numbers than fares. I am afraid that I might get dead lock errors if I use a bigger batchSize.
Thanks @koji. I have created constraints on both nodes. Wouldn't that be enough ? I thought constraints created an index.
CREATE CONSTRAINT ON (f:FareTariff) ASSERT f.name IS UNIQUE;
CREATE CONSTRAINT ON (f:FareBasis) ASSERT f.name IS UNIQUE;
Also I verified with call db.constraints(); that my constraints are created properly:
"constraint_9fff29c0" "CONSTRAINT ON ( faretariff:FareTariff ) ASSERT (faretariff.name) IS UNIQUE"
"constraint_f599caff" "CONSTRAINT ON ( fare:Fare ) ASSERT (fare.ID) IS UNIQUE"
is there any other way to check what is causing this to go so slow. ?
Could you please try below
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM
"file:///home/pdss01/satish/intl/part-00004-4df038b2-3d1d-47ba-ad35-f391e09d7306-c000.csv" AS line
match (f:Fare {ID: line.FAREID})
match (ft:FareTariff {name: toInteger(line.TARIFF_NBR)})
CREATE (f)-[fft:fare_to_faretariff]->(ft)