Questions about designing database for a digital asset management system

Bit of backgorund

Hello all, I am new to Neo4J community and in database design in general.

I am a Visual Effects Technical Director based out of London and my day job usually includes creating pictures, tools etc for visual effects in films. I am planning to create an Asset Management system which allows me to iterate much faster and manage the assets in optimal manner.

I have scoured the internet for different database platforms both SQL and NoSQL based but never really felt that these tools aligned in a way I thought about data. I happened run into graph based semantic databases and discovered Neo4J which I think is perfect for the way I want to create my Asset management system.

Since I am fairly new and doing this by myself in my own time, please excuse any noob questions I may have thrown

With that background out the way, I want to actually start throwing questions now.

Primary objectives for AMS

I appreciate that this is quite a daunting task and have set some realistic expectations for the database at the moment. The primary objective of the database right now will be to efficiently resolve asset paths on disks, their dependencies, metadata (who created the assets, when was it created, department etc).

I also want to write few applications which will allow me to analyse this data and put it in production context (how long the iteration of each asset takes, how many assets have been generated etc) which also gives it Task management like properties. This is also an important aspect for me. Which leads me to quite a few design possibilities and I would love to get your opinions and thoughts on this.

  • Should I create one database which handles both the tasks and assets or should I have one database for Assets and another for Tasks.

  • Keeping them together in same database gives me a richer contextual view but is it going to be performant when the tasks and asset count increases dramatically. (VFX is highly iterative process and usually has multiple versions of each assets. There also are huge number of tasks cumulatively)

  • To work around this I thought what if I could separate these databases and have them communicate with each other. So anytime a task gets created in the database handling tasks, the asset management database is made aware of this. Is this over-engineering things? What would be better ways to handling these scenarios?

  • What might be the common pitfalls and noob mistakes when implementing something like this?

  • The plan I have at the moment is to model a database using Neo4j which will be running on a central server (I plan to use a NAS). Ideally there will be multiple users (only two at the moment however, it should scale well with more users) accessing the database using different machines via the official python drivers. How efficient/fast is it going to be to write a python application which is used to interface with the database.

I am sure some of these questions might be super basic however I wanted to get these out here on the forum so I can learn from the community. I look forward to your suggestions and opinions on this.

Cheers,
Bhavesh.

Hello Bhavesh,

Welcome to the Community!

Your project is a great fit for Neo4j. A Neo4j database can handle billions of nodes and relationships (Enterprise Edition) so my suggestion would be to have all data in one database, especially if the relationships between Tasks and Assets are to be transactionally consistent.

The design of your graph model will be an important step where you will want to iteratively model and test the performance against realistic data.

Perhaps others in the community can chime in here, especially if they have developed a graph that is similar to your use case.

Elaine

Hello Elaine, thanks for your thoughts!

I have few more questions related to the graph model. Would be great to have people's view on these.

  • Lets say I have various class/types of nodes each reflecting the type of the asset, most of these will have a uid property which I mean to use for indexing. I have read online about exercising caution when it comes to indexing. So I wanted to make sure if I am approaching this correctly. Can we index different class/node types to a property? (I will most likely have 50-100 class types)
    For example, lets assume there are 4 different types (not instances):

    • Asset X, Y, Z and W with properties {name, uid, ...}, is it good practice to index these on uid property (provided the uids are unique)? Also Do we need to create these indices when setting up the database or can we add them later if some newer class types are also introduced?
  • I also have been looking at implementing some semblance of versioning for these node/asset instances. I came across few blog posts from Ian Robinson which use notion of linked lists and timestamped relationships to achieve this. Are there any other approaches to this or is this still the preferred approach?
    In a VFX production environment we typically have hundreds of unique versions per asset and I wanted to identify some good and efficient approaches to achieving this. So far I have only seen the posts from Ian addressing this.

I look forward to the community's opinion and thoughts on this.

Many thanks,
Bhavesh.

I have actually come across another interesting blog post from @tomzeppenfeldt about Network versions which seems like a viable alternate to Ian's method since I need to implement versioning per asset and preferably preserve the network hierarchy for each version (a different version of same asset can have a new dependency or have fewer dependencies).

Would be really great if anyone with any experience from similar projects could chime in!