neo4j version: Community 4.2.0
dbms.memory.heap.initial_size=25G
dbms.memory.heap.max_size=25G
dbms.memory.pagecache.size=8G
desktop version: 1.3.11
Hello,
What would be the best way to write a cypher query that deletes an existing node from Neo4j if the unique ID on that node is not present in a csv file? I think I found a solution for this but I don't think I fully understand it. First I'm trying to write the query in pure Cypher so that I can pass it to my python script using the neo4j driver.
I'm running daily data extraction python scripts to create csv's and I need a cypher query to delete nodes from the existing Neo4j graph database if the node id is not found in the new csv file.
I think I found the answer here: Neo4J Delete Nodes With Field Value Not in CSV with Cypher - Stack Overflow
The daily csv files with about 100k rows have a format like this:
person_id, fname, lname, location
1, name1, lname1, london
2, name2, lname2, munich
3, name3, lname3, beijing
4, name4, lname4, tokyo
Let's say that on the next day, I run my python script and now the csv looks like this:
person_id, fname, lname, location
1, name1, lname1, london
2, name2, lname2, munich
3, name3, lname3, beijing
What would be the cypher query to delete the node with person_id = 4?
Based on the stack overflow thread, I have this:
:auto USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS
FROM 'file:///'+$filecsv AS row
// Create a collection of the name ID's that you can check against in the cypher below.
WITH COLLECT(toInt(row.person_id)) AS newlist
//Find the nodes from your graph database to compare against newlist
MATCH(n:People)
WHERE EXISTS (n.name_id)
AND
NOT n.name_id IN newlist
DETACH DELETE n;
So am I right if I describe this query with the following statements?
Step 1: Load the daily file
:auto USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS
FROM 'file:///'+$filecsv AS row
Step 2: create a collection of the name ID's that show up in the new csv file. This function stores the name id's in the list so that you can check for existance your current name id's in the graph database
// Create a collection of the name ID's that you can check against in the cypher below.
WITH COLLECT(toInt(row.person_id)) AS newlist
Step 3: Retrieve the existing nodes from the graph database. And delete the nodes if:
a) the current name already exists, and
b) the current name_id is not found in the new list.
//Find the nodes from your graph database to compare against newlist
MATCH(n:People)
WHERE EXISTS (n.name_id)
AND
NOT n.name_id IN newlist
DETACH DELETE n;
This seems to be working for me, but is there a better to write this if I plan to execute this query with the Python neo4j driver on a daily basis?
Any insights would be greatly appreciated.
Thank you