Are property dense nodes in Neo4j problematic in a similar way denormalized tables are in SQL?
I have large amounts of aggregated properties I need to store for alot of smaller subgraphs (around 150 properties per subgraph, too many to aggregate on the fly and there will be lots of reads of these properties). I am contemplating storing them on a seperate 'aggregated stats'-node and linking them together with a relationship to the rest of the subgraph.
I am wary if this will affect read times or best practices in any way. Would this use case be problematic for Neo4j? We're looking into maybe using MongoDB instead. I am asking here because I am unsure if my suspicions are warranted or not. Any help on this would be greatly appreciated
I don't know of a problem that would make this problematic per se; somewhat related to this it is considered an anti-pattern to have really large properties (for example, storing 20mb videos as a byte array on a property). But it's fine to have lots of properties.
The thing about having lots of properties on your nodes though is that it's a "modeling smell". Nothing wrong per se, but if you have 20 properties, I'll bet some of them will be categorical variables (like gender=M, F) or color=red, blue, green. And if you have categorical variables, graph modeling folks may ask you why you made it a property value rather than a separate node linked by relationships.
Suppose you have a "color" property, and you have a domain of 200 possible colors, and 1 million products, each of which has a color. You can model this as 200 color nodes and links from all of the products to their color, which better exploits the graph model and lets you do all kinds of other queries faster, or you can put color=green as a property on every node.
So ultimately this will depend on the semantics of your model, but I would reconsider so many properties per node, not because it's bad for neo4j, but because maybe you're leaving some opportunity on the table to improve your query speed & model comprehensibility.
I heard from other folks that separating out groups of correlated properties into separate nodes helped them a lot both with modeling and performance.
You can kinda see this as a break down of a huge entity into an aggregate with separate sub-parts for less related bits. Also kinda like a document decomposition.
Thanks for the replies. I think we will stick with neo4j for now. We cannot factor the properties into nodes and relationships as they are computationally heavy aggregations made from the subgraphs in the database though. So they are not generic to a broader set of nodes or other subgraphs.