cancel
Showing results for 
Search instead for 
Did you mean: 

Optimization of Cypher query to create nodes

ps1105to
Node

Hi,

I am trying to do the following:

  1. Getting a unique list of a property (prop1) from Node1 based on certain conditions.
  2. Using this list to call an API. The API can accept a max of 500 values and so Step 1 has that restriction built in to get 500 per list at a time.
  3. Using the result from the API, I am trying to create 3 different nodes. To do this, I am matching back again with Node1 to get certain other properties and Node2 to get additional properties. Then I am actually creating the node. The cypher for Step 3 is here and I am trying to use apoc.periodic.iterate to improve performance of creating these nodes. I am facing a few problems:
    a) The nodes are not getting created using this cypher query
    b) This process is taking a very long time.
call apoc.periodic.iterate ('

UNWIND $input as input

WITH input

MATCH (b:Node1) 
WHERE b.prop1 = input.prop1
AND substring(b.prop2,0,10) = "2021-09-08"

WITH input, b
MATCH (c:Node2 {prop3:b.prop3})

RETURN 
b.prop4 as prop4,
b.prop3 as prop3,
c.prop2 as prop2,
input

','

WITH input, prop4, prop3, prop2
MERGE (a:NodeA {prop1:prop4})

SET a.prop2 = "ABC"
SET a.prop3 = prop3
SET a.prop4 = prop2
SET a.prop5 = input.prop1
SET a.prop6 = input.prop2
SET a.date = "2021-09-09"

FOREACH (
_ IN CASE WHEN input.prop10 = "low" THEN [1] END |
SET a:NodeB
)
FOREACH (
_ IN CASE WHEN input.prop10 <> "low" THEN [1] END |
SET a:NodeC
)
FOREACH (
_ IN CASE WHEN input.error IS NOT NULL OR input.errorCode <> 200 OR input IS NULL THEN [1] END |
SET a:NodeD
SET a.errorCode = input.errorCode
SET a.errorDesc = input.error
)
',
{batchSize:100, parallel:true, params:{input:$input}}
)

An additional thing to consider: Node1 is a very large in volume (~1billion). Any pointers as to how I can improve this process or why my cypher query is not creating nodes will be helpful.

1 REPLY 1

You may want to review EXPLAIN plans of your inner queries. The EXPLAIN plan of the entire query won't be helpful, as it will stop at the iterate() call, it won't analyze the cypher strings within it.

You will need to make sure you have supporting indexes for your MATCH and MERGE operations. You do NOT want to see NodeByLabelScan operators (unless you're processing all nodes of the given label instead of matching to specific ones)

In this case, you need indexes on :Node1(prop1) and :Node2(prop3), as well as :NodeA(prop1).
Note that the index can't be used on :Node1(prop2) because you're doing a substring() on the property. It looks like you're dealing with date/time strings, so you might reconsider to use our dateTime types instead, that can let you use index range lookups.