cancel
Showing results for 
Search instead for 
Did you mean: 

Best practice to manage bulk importable data and use these live updates

espen_solbu
Node Link

Hello,
I am wondering if there is any best practices for manging bulk importable data, and then using "subsets" of these to perform live updates.
I'll try to explain:
(whenever i say "bulk importable", i mean CSV's that fit with the neo4j admin import)

My "full" set of bulk importable data, gets updated from master sources every day. This allows me at any time to instansiate a brand new, fully populated database, extremely fast.
The bulk importable data live in multiple folders pr master data source, with separate header files.

This is super convenient, as i can restore at any time, as well as perform development on multiple databases (e.g. desktop pluss a container db)

However, i also need to keep a running database updated. I currently have separate scripts for this, that poll the master data sources, and update the database using python/bolt. Alot of the logic has been rewritten for these scripts to be able to create the same relationships as the scripts that construct the bulk importable data.

3X_4_3_435ea0f4e145274cc4db79ad2a80ccb880eb5394.png

While this works, it does not feel like the best way of doing it, and it will be a challenging solution to scale as "understanding" of the model, must be injected multiple places.

It feels like this could possibly be done purely by organization of the bulk importable csv's. Where admin import "grabs it all", and the upgrade script, only grabs some of the files and applies them utilizing the structure from the bulk importable files

Something like this:
3X_c_b_cbe480f72cfefcf82d0b1154f216fe73e9055739.png

My questions:

  • Are there best practices around flows that can both keep the "bulk importable data" updated, as well as keeping a live db updated?

  • Are there best practices around organizing your bulk importable data that make this easier (e.g. filenames or folders?)

  • Other advise?

thanks and regards

0 REPLIES 0
Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit

On November 16 and 17 for 24 hours across all timezones, you’ll learn about best practices for beginners and experts alike.