Question about MATCH, WITH and knowledge graphs best practices

laurange · June 18, 2023, 1:22pm

I'm trying to understand what is the best way to structure queries that create/update batches of data in a knowledge graph without overwriting.

I'm using something like this:

MERGE (person:Person {name: "Steve"})
ON CREATE [... do something with this person if it's being created]
MERGE (person:Person {name: "Frank"})
...

This query fails because the person variable is being set on line 3 but was already set in line 1. I read that WITH might help with this as I could structure every block independently. I wasn't able to do it though.

What is the best practice for create/update operations using batches of data?

laurange · June 18, 2023, 1:35pm

I figured out that if I break the query into separate blocks with ; then one specific location at the time is used which works. My question then is: is creating "longer" queries with around 10 operations at the time something to avoid? my data comes in batches of around that size so it would be easier to perform all these in batches

glilienfield · June 18, 2023, 2:34pm

Adding the semicolon effectively separated the statements into entirely separate queries. As such, the collision in the “person” variable doesn’t exists. This worked in your case because the statements were indeed independent.

You will have many cases where you need those two statements to execute in the same query because you want to operate on both nodes. The solution is to name the nodes differently, such as, person1 and person2, or frank and steve.

laurange · June 18, 2023, 7:06pm

Thanks, I'm doing this to create a batch of separate nodes.

Just a curiosity, would having a single query with different variables instead of person, be faster?

glilienfield · June 18, 2023, 8:42pm

It will be faster as one statement, as it will be done in a single transaction. You can see the difference it you execute it the browser. Each statement is executed separately.

You are not required to use variables for every node/relationship. You only need them when you need to reference the entity later in the same query.

ameyasoft · June 18, 2023, 9:56pm

You are a good teacher!

myron_higerd · June 19, 2023, 9:20pm

If the process is the same for each record (i.e. Steve and Frank), then you could pass these as an array into the process as follows:

WITH ["Steve", "Frank"] as inputPersons
UNWIND inputPersons as inputPerson
MERGE (person:Person {name: inputPerson})
ON CREATE [... do something]

The array could also be a map so you could pass more data associated with each person.

WITH [ 
    {name: "Steve", otherData: 123}, 
    {name: "Frank", otherData: 456}
] as inputPersons
UNWIND inputPersons as inputPerson
MERGE (person:Person {name: inputPerson.name})
ON CREATE SET person.otherData = inputPerson.otherData

Topic		Replies	Views
Batch CREATE and MERGE statements Cypher performance , cypher , import	5	4443	June 28, 2020
Merge Nodes and Relationship Cypher cypher	14	548	July 31, 2020
Question about creating an intermediate node Cypher	2	33	September 5, 2024
Is there a way to create relationships between existing nodes in bulk? Cypher	1	1146	November 11, 2020
Using Unwind and Dumping Data in neo4j - Query Optimization Cypher apoc , performance , cypher	0	517	July 9, 2020

Question about MATCH, WITH and knowledge graphs best practices

Related topics