Question about MATCH, WITH and knowledge graphs best practices

I'm trying to understand what is the best way to structure queries that create/update batches of data in a knowledge graph without overwriting.

I'm using something like this:

MERGE (person:Person {name: "Steve"})
ON CREATE [... do something with this person if it's being created]
MERGE (person:Person {name: "Frank"})
...

This query fails because the person variable is being set on line 3 but was already set in line 1. I read that WITH might help with this as I could structure every block independently. I wasn't able to do it though.

What is the best practice for create/update operations using batches of data?

I figured out that if I break the query into separate blocks with ; then one specific location at the time is used which works. My question then is: is creating "longer" queries with around 10 operations at the time something to avoid? my data comes in batches of around that size so it would be easier to perform all these in batches

Adding the semicolon effectively separated the statements into entirely separate queries. As such, the collision in the “person” variable doesn’t exists. This worked in your case because the statements were indeed independent.

You will have many cases where you need those two statements to execute in the same query because you want to operate on both nodes. The solution is to name the nodes differently, such as, person1 and person2, or frank and steve.

1 Like

Thanks, I'm doing this to create a batch of separate nodes.

Just a curiosity, would having a single query with different variables instead of person, be faster?

It will be faster as one statement, as it will be done in a single transaction. You can see the difference it you execute it the browser. Each statement is executed separately.

You are not required to use variables for every node/relationship. You only need them when you need to reference the entity later in the same query.

2 Likes

You are a good teacher!

1 Like

If the process is the same for each record (i.e. Steve and Frank), then you could pass these as an array into the process as follows:

WITH ["Steve", "Frank"] as inputPersons
UNWIND inputPersons as inputPerson
MERGE (person:Person {name: inputPerson})
ON CREATE [... do something]

The array could also be a map so you could pass more data associated with each person.

WITH [ 
    {name: "Steve", otherData: 123}, 
    {name: "Frank", otherData: 456}
] as inputPersons
UNWIND inputPersons as inputPerson
MERGE (person:Person {name: inputPerson.name})
ON CREATE SET person.otherData = inputPerson.otherData