I am wondering if there is any best practices for manging bulk importable data, and then using "subsets" of these to perform live updates.
I'll try to explain:
(whenever i say "bulk importable", i mean CSV's that fit with the neo4j admin import)
My "full" set of bulk importable data, gets updated from master sources every day. This allows me at any time to instansiate a brand new, fully populated database, extremely fast.
The bulk importable data live in multiple folders pr master data source, with separate header files.
This is super convenient, as i can restore at any time, as well as perform development on multiple databases (e.g. desktop pluss a container db)
However, i also need to keep a running database updated. I currently have separate scripts for this, that poll the master data sources, and update the database using python/bolt. Alot of the logic has been rewritten for these scripts to be able to create the same relationships as the scripts that construct the bulk importable data.
While this works, it does not feel like the best way of doing it, and it will be a challenging solution to scale as "understanding" of the model, must be injected multiple places.
It feels like this could possibly be done purely by organization of the bulk importable csv's. Where admin import "grabs it all", and the upgrade script, only grabs some of the files and applies them utilizing the structure from the bulk importable files
Something like this:
Are there best practices around flows that can both keep the "bulk importable data" updated, as well as keeping a live db updated?
Are there best practices around organizing your bulk importable data that make this easier (e.g. filenames or folders?)