Optimization of Cypher query to create nodes

ps1105to · September 28, 2021, 5:23pm

Hi,

I am trying to do the following:

Getting a unique list of a property (prop1) from Node1 based on certain conditions.
Using this list to call an API. The API can accept a max of 500 values and so Step 1 has that restriction built in to get 500 per list at a time.
Using the result from the API, I am trying to create 3 different nodes. To do this, I am matching back again with Node1 to get certain other properties and Node2 to get additional properties. Then I am actually creating the node. The cypher for Step 3 is here and I am trying to use apoc.periodic.iterate to improve performance of creating these nodes. I am facing a few problems:
a) The nodes are not getting created using this cypher query
b) This process is taking a very long time.

call apoc.periodic.iterate ('

UNWIND $input as input

WITH input

MATCH (b:Node1) 
WHERE b.prop1 = input.prop1
AND substring(b.prop2,0,10) = "2021-09-08"

WITH input, b
MATCH (c:Node2 {prop3:b.prop3})

RETURN 
b.prop4 as prop4,
b.prop3 as prop3,
c.prop2 as prop2,
input

','

WITH input, prop4, prop3, prop2
MERGE (a:NodeA {prop1:prop4})

SET a.prop2 = "ABC"
SET a.prop3 = prop3
SET a.prop4 = prop2
SET a.prop5 = input.prop1
SET a.prop6 = input.prop2
SET a.date = "2021-09-09"

FOREACH (
_ IN CASE WHEN input.prop10 = "low" THEN [1] END |
SET a:NodeB
)
FOREACH (
_ IN CASE WHEN input.prop10 <> "low" THEN [1] END |
SET a:NodeC
)
FOREACH (
_ IN CASE WHEN input.error IS NOT NULL OR input.errorCode <> 200 OR input IS NULL THEN [1] END |
SET a:NodeD
SET a.errorCode = input.errorCode
SET a.errorDesc = input.error
)
',
{batchSize:100, parallel:true, params:{input:$input}}
)

An additional thing to consider: Node1 is a very large in volume (~1billion). Any pointers as to how I can improve this process or why my cypher query is not creating nodes will be helpful.

andrew_bowman · September 28, 2021, 7:04pm

You may want to review EXPLAIN plans of your inner queries. The EXPLAIN plan of the entire query won't be helpful, as it will stop at the iterate() call, it won't analyze the cypher strings within it.

You will need to make sure you have supporting indexes for your MATCH and MERGE operations. You do NOT want to see NodeByLabelScan operators (unless you're processing all nodes of the given label instead of matching to specific ones)

In this case, you need indexes on :Node1(prop1) and :Node2(prop3), as well as :NodeA(prop1).
Note that the index can't be used on :Node1(prop2) because you're doing a substring() on the property. It looks like you're dealing with date/time strings, so you might reconsider to use our dateTime types instead, that can let you use index range lookups.

Topic		Replies	Views
Optimizing the writing of large amounts of data in neo4j with apoc Parquet, periodic iterate Procedures & APOC apoc , performance , cypher	2	530	November 24, 2023
Cypher query optimization General migrated	13	174	August 30, 2022
Create edge using apoc.periodic.iterate suffer from Cartesian product Neo4j Graph Platform migrated	8	155	January 6, 2023
Statement using Apoc Periodic Iterate gets stuck, but works without the iterate Cypher apoc , cypher	3	183	March 10, 2023
Creating Large Number of Edges with `apoc.periodic.iterate` Cypher apoc , performance , cypher	3	412	October 24, 2023

Optimization of Cypher query to create nodes

Related topics