What is worse read several times a big object (about 70000 rows) or optimize these difficult queries? node js

pavloN · November 16, 2019, 7:56am

I received a large file. These two queries work badly.
Now, these queries are needed to be optimized.
Separating queries is a good idea. But, if I split up them, I will go through this big file several times.
What is better to do: separate queries or to make better them?
Might, somebody has ideas about how to optimize them)

CALL apoc.periodic.iterate('WITH apoc.convert.fromJsonList($data) as arr UNWIND arr as v RETURN v'
,'
FOREACH ( i in CASE WHEN v.dog=false THEN [1] ELSE END | MERGE (c:Cat{id:v.id, version: "${version}"}))
FOREACH ( i in CASE WHEN v.dog=true THEN [1] ELSE END | MERGE (c:Dog{id:v.id, version: "${version}"}))
WITH v
MATCH (c{id:v.id, version:"${version}"})
UNWIND RANGE(0,CASE WHEN length(v.weightOfAllCat)>length(v.weightOfAllDog)THEN length(v.weightOfAllCat) ELSE 	 length(v.weightOfAllDog) END) as i
MERGE (p:Prod {ean: v.name, version: "${version}"})
MERGE (a:Pro {ean: v.name, version: "${version}"})

WITH v, c, p, a
CALL apoc.do.when(v.dog=false,  "MERGE (c)-[:PRI]->(p)  MERGE (c)-[:ALTER]->(a)",
"MERGE (c)-[:PRI_A]->(p) MERGE (c)-[:ALTER_A]->(a)",
{v:v, c:c, p:p, a:a}) YIELD value
RETURN value
',
{ batchSize: 5000, iterateList: true, parallel:true, params:{data:'${data}'}})

UNWIND split("${prod}", ",") as prod_id
MATCH (p:Prod{id:prod_id, version: "${version}"})<-[a:ALTER]-(c:Cat{version: "${version}"})
WITH max(toInteger(apoc.text.replace(c.id,'[A-Za-z+]', ""))) as max, p
MATCH (c:Cat{version: "${version}"})-[:ALTER]->(p)
WHERE toInteger(apoc.text.replace(c.id,'[A-Za-z+]', "")) =max
MATCH (c:Cat{version: "${version}"})-[d:ALTER]->(p)
MERGE (c)-[:PRIM]->(p)
MERGE (c)-[:ALTER_P]->(p)
DETACH DELETE d
RETURN collect(DISTINCT(p.prod_id)) as proc

Thomas_Silkjaer · November 26, 2019, 10:29am

First of all, you are running the apoc.iterate in parallel mode while also adding relationships. When creating a relationship, locks are made on both connected nodes, and you risk a deadlock situation (unless you a sure that no relationships are made to the same nodes in the entire set).

MATCH (c{id:v.id, version:"${version}"})does not specify label, an index would help.

You are also matching and merging nodes with multiple properties, e.g. MERGE (p:Prod {ean: v.name, version: "${version}"}) – are these indexed as a composite indexes?

Is this imported to an existing database? Otherwise preprocessing the content to CSV files and using neo4j-admin import is likely the fastest approach (depending on the size of the dataset).

pavloN · November 26, 2019, 4:54pm

@Thomas_Silkjaer Thank you!

Topic		Replies	Views
Optimizing the writing of large amounts of data in neo4j with apoc Parquet, periodic iterate Procedures & APOC apoc , performance , cypher	2	442	November 24, 2023
Exhaustive query optimization Cypher apoc , querying , performance , cypher , merge	4	317	April 8, 2022
Performance drop in query when trying to run on ~4000 items with apoc.do.case Cypher apoc , performance	1	286	September 14, 2021
Optimize Neo4j cypher query on huge dataset Cypher optimization , performance , cypher , neo4j	3	354	December 20, 2021
Large Batch Job - Help would be incredibly appreciated Cypher apoc	8	484	January 24, 2021

What is worse read several times a big object (about 70000 rows) or optimize these difficult queries? node js

Related topics