I have the following load csv cypher wrapped inside the apoc.periodic.iterate.
I tried to increase the batch size and I am getting locking errors. It is loading at very slow pace. What is a way to increase the loading speed and not get dead lock errors.
I tried increasing the batchsize and setting parallel to false but it is still very slow. I need to load three files like this each with 1.5 million records. It ran for almost two hours and all I could see was 500K records processed. I have a box with 24 cores and 96Gig memory. CPU is barely used.
Is there a way to make this go faster. ? Thanks a bunch in advance.
What version of Neo4j? version of APOC?
Do you have any indexes defined on
the labels involved in the query? Can you return the output of cypher statement
Thanks @dana_canzano. I started with LOAD CSV but it way too slow as well so I thought putting it inside the apoc.periodic.iterate will increase the parallelism but not much luck.
I came across this tool called Kettle (Pentaho Data Integration) with the hopes that it allows me to load the data faster. It looks like this product is a hassle to even install. Wondering if anyone uses this with Neo4J plugins apart from just playing around?
In general it's not a good idea to executing loading queries that create relationships in parallel, since relationship creation requires locks on both nodes. In parallel in batches can lead to lock contention and deadlock.
A higher batch size without parallel tends to work fairly well.
You should of course make sure you have the right indexes created to support your initial MATCH and MERGE operations, indexes on:
:Company(name), :FrieghtReadingTariff(name), and :FreightBasis(name)
Also run an EXPLAIN of the query and ensure there are no Eager operators. While these are often needed for correct execution, they require manifestation of all results at that point in the plan, effectively disabling a periodic commit operation, which is another ingredient toward locking issues and potential deadlocks.
I think i ran a command which suggested these number and I am using them. It definitely got me out of the OutOfMemory exceptions I was getting earlier.
Total memory on the box is 96Gigs. I always see about 50Gigs of free memory.
I have been running EXPLAIN on the other queries but don't understand exactly how to interpret and make it better. Is there any video out there that can explain this EXPLAIN. :-).
I did create unique constraints on Company, FreightReadingTariff and FreightBasis as shown below.
CREATE CONSTRAINT ON (f:FreightBasis) ASSERT f.name IS UNIQUE;
CREATE CONSTRAINT ON (f: Company) ASSERT f.name IS UNIQUE;
CREATE CONSTRAINT ON (f:FrieghtReadingTariff) ASSERT f.name IS UNIQUE;
Is this what you mean by creating index. ?
I am thinking I should break this loading query into separate jobs so I can run the creation of nodes in batch and parallel and then create relationships in batch with no parallelism.
hello guys ,
i am new to neo4j and i am using neo4j enterprise version on a server and i am running the query to load csv and form nodes but query is taking about 40 hours to run. my data size is about 1.7 million .
here is my query
is there any way it can run faster??
''':auto using periodic commit 2000
load csv with headers from "file:///data.csv" as line
merge (s:State{State:line.State})
merge (p:Product{ProductId:line.ProductId,Price:line.ProductPrice,ProductName:line.ProductName})
merge (c:Category{Category:line.Category})
merge (cr:CategoryRollUp{CategoryRollUp:line.CategoryRollUp})
merge (cu:Customer{CustomerId:line.CustomerId,Grouping:line.NewGrouping})
merge (cu)-[:InteractsWith{Date:line.TransactionDate,Month:line.Month,EventScore:line.EventScore,Event:line.Event,Quantity:line.QtySold}]->(p)
merge (s)-[:HasUser]->(cu)
merge (c)-[:HasProduct]->(p)
merge (cr)-[:HasCategory]->(c)"""
Could you please tell us how much data you have to load?
Are you loading via browser or trying Java/Python code?
Don't you see much help with Periodic Iterate?