Good afternoon!
My question is about the suitability of neo4j for accomplishing what I need. I feel like it may be, but some uncertainty regarding neo4j's capabilities is making me second-guess it. You can skip ahead to "Here's where I start to wonder..." if the context/background below is too much.
In a nutshell, I need to establish the database framework for kitting out different fleets of vehicles, with full sourcing and attribution of data. It'll have a front-end where a user can select from all of the many options to "build" vehicles and then add them to some number of fleets. In the end, I'd have a DB of organizations, fleets they own, and the composition of each of those fleets all the way down to the specific vehicle with their equipped systems, performance data, and other relevant info. The user would be able to view and edit the data, and other software would be able to query the db for inputs to run simulations and do analysis.
I need to be able to do things/answer questions like (but not limited to):
- "What is the max speed of a 2022 Subaru Ascent Onyx Edition?"
- "Which models have Starlink as an option?"
- "Which specific Camrys are equipped with a model XYZ surround sound system?"
- "What are the vin numbers for all manufactured 2019 Tesla Model Xs?"
- "What is Hendrick Motorsport's US-based fleet composed of?"
- "Who entered this datum?"
- "When was this datum entered?"
- "What is the source of this datum?" (think MLA citation)
- "What is the classification of this datum?" (think public vs proprietary)
- Highlight data that is over two years old.
- Do (or don't) filter out and display data of some classification level (e.g. no proprietary).
- Generate a "baseball card" that summarizes key data. Picture a slide or word doc.
- Have a separate piece of software query for fleet and vehicle data and then run a sim.
Here's where I start to wonder if Neo4j can do it:
-
I must be able to show aggregated information for the same field from many different sources and classifications (of both the source document and the data itself).
-
E.g. Source "A" – a public website – says that a 2022 Honda Civic's mpg rating is 34 mpg. Meanwhile, source "B" – a Honda proprietary engineering document – says that a 2022 Honda Civic's mpg rating is only 31 mpg. Meanwhile, source "C" – which is marked as overall Toyota Proprietary, though the specific data point is unclassified – says it's 32.5 mpg.
-
Also making it more complicated is that, sometimes, not only the data itself but the fact of a relationship can be proprietary. I have to be able to ensure that the act of compiling certain data together doesn't accidentally raise the classification.
-
E.g. "Toyota Camry" is fine on its own. "XV70" is fine on its own. But, the fact that the XV70 IS a Toyota Camry may have been proprietary back before it was publicly announced. It's common for project code names.
I don't think I can just have tables of attribute values because each attribute only supports having one value, and there are many instances of conflicting information that I have to keep track of. Also, classification isn't a simple attribute. There are many types of controlled information, such as organizational (Tesla Proprietary vs Honda Proprietary), or Personally Identifiable Information (PII), or confidential legal documents. I have to be able to associate an organizational node (e.g. Tesla) with a classification level node (e.g. proprietary) to declare what a particular datum's classification is (Teslar Proprietary). It also impacts who the data is releasable to, which we have to keep track of so we can dynamically control who gets to see what based on their credentials (e.g. a Tesla employee is allowed to see Tesla Proprietary data but not Toyota Proprietary data) and therefore not get in massive legal trouble.
My initial research seemed to eliminate relational databases as a whole as an option and indicated that neo4j may be able to support what I need, but I'm not sure it does. The closest thing I've been able to find is this post (in which the answer is "no," but I'm not quite sure if we're asking the same thing or not.
Solved: Relationship Properties - Neo4j - 33804
I'm pretty new to database design and completely new to neo4j. My background is primarily in modeling and simulation, but I'm frothing at the mouth for an opportunity to establish a central DB my many models can query instead of having to hand-jam all this crap into each one separately, introducing plenty of risks and wasting time.
Long post, I know, but I'd really appreciate any feedback as I get started on my databasing journey! Thank you very much!