Ability to determine deleted nodes/relationships?

mbandor · February 22, 2022, 3:38pm

I've been working on an import script (APOC based) to input the contents of a customer built Excel spreadsheet and built the nodes & relationships. So far that has gone well for the initial import and creation of the graph. As this spreadsheet is updated monthly by the customer, I originally was going to use a modified version of the import script to update the graph. I should be able to check for the existence of the nodes and relationships just fine (ON CREATE, ON MERGE, etc.), however how should I address if the prior node/relationship no longer exists in the update (likely possibility)? It almost sounds like the better (easier) option is just to rebuilt the graph from each monthly update from the customer. This is basically a reverse parsing situation (parse the graph and compare with the spreadsheet).

Your thoughts?

dkm1006 · February 28, 2022, 9:16am

If you don’t want to delete all nodes and relationships, then create all of them again, you could add a property lastUpdated which is set during the update-import. After the import-update you could thus delete only the nodes which where not updated within the last 24 hours or so.

mbandor · February 28, 2022, 4:40pm

I do have a Last_Updated property on the nodes. Where things get interesting is information may not necessarily change for months/years so I don't want to induce the potential for false positives when doing a query due to old information still contained in the graph (e.g., a product no longer being tracked for obsolescence).

dkm1006 · March 1, 2022, 4:54pm

Maybe we had a misunderstanding. To make clear what I meant, let’s call the proposed property last_imported. As the last step of your import script you would then run something like

MATCH (n) WHERE n.last_imported < datetime() - duration({hours:24})
CALL { 
    WITH n
    DETACH DELETE n
} IN TRANSACTIONS OF 1000 ROWS

mbandor · March 1, 2022, 6:05pm

Hmm, I hadn't considered this as the last step. You might be onto something. Thanks for the suggestion!

dkm1006 · March 2, 2022, 8:20am

You’re welcome :) Don’t forget to put an index on last_imported if you go along with this solution. That will speed up the deletion step considerably, I think.

Topic		Replies	Views
Graph Academy: Importing csv files Challenge string to date Graph Academy & Certifications migrated	1	66	June 13, 2022
Deleting older relationships Cypher performance	10	773	June 14, 2020
Building daily new graph Import / Export	11	1143	September 4, 2018
Delete duplicate data and restore relationship Cypher cypher	2	1755	March 17, 2020
Graph deletion starting from root Neo4j Graph Platform migrated	1	127	June 4, 2022

Get Certified in June!

Ability to determine deleted nodes/relationships?

Related topics