Property dense nodes

Martin · November 20, 2018, 12:25pm

Are property dense nodes in Neo4j problematic in a similar way denormalized tables are in SQL?

I have large amounts of aggregated properties I need to store for alot of smaller subgraphs (around 150 properties per subgraph, too many to aggregate on the fly and there will be lots of reads of these properties). I am contemplating storing them on a seperate 'aggregated stats'-node and linking them together with a relationship to the rest of the subgraph.

I am wary if this will affect read times or best practices in any way. Would this use case be problematic for Neo4j? We're looking into maybe using MongoDB instead. I am asking here because I am unsure if my suspicions are warranted or not. Any help on this would be greatly appreciated

david_allen · November 20, 2018, 4:36pm

I don't know of a problem that would make this problematic per se; somewhat related to this it is considered an anti-pattern to have really large properties (for example, storing 20mb videos as a byte array on a property). But it's fine to have lots of properties.

The thing about having lots of properties on your nodes though is that it's a "modeling smell". Nothing wrong per se, but if you have 20 properties, I'll bet some of them will be categorical variables (like gender=M, F) or color=red, blue, green. And if you have categorical variables, graph modeling folks may ask you why you made it a property value rather than a separate node linked by relationships.

Suppose you have a "color" property, and you have a domain of 200 possible colors, and 1 million products, each of which has a color. You can model this as 200 color nodes and links from all of the products to their color, which better exploits the graph model and lets you do all kinds of other queries faster, or you can put color=green as a property on every node.

So ultimately this will depend on the semantics of your model, but I would reconsider so many properties per node, not because it's bad for neo4j, but because maybe you're leaving some opportunity on the table to improve your query speed & model comprehensibility.

michael.hunger · November 21, 2018, 10:57am

I heard from other folks that separating out groups of correlated properties into separate nodes helped them a lot both with modeling and performance.

You can kinda see this as a break down of a huge entity into an aggregate with separate sub-parts for less related bits. Also kinda like a document decomposition.

I think it's fine.

Martin · November 21, 2018, 4:17pm

Thanks for the replies. I think we will stick with neo4j for now. We cannot factor the properties into nodes and relationships as they are computationally heavy aggregations made from the subgraphs in the database though. So they are not generic to a broader set of nodes or other subgraphs.

Topic		Replies	Views
Throughput for creation of nodes in Neo4J 3.5.7 decreases significantly with the number of properties Neo4j Graph Platform	7	898	August 16, 2019
Neo4j Relation vs Properties Newbie Questions	1	428	August 1, 2020
Use cases for node properties vs relationship properties Newbie Questions performance	4	583	January 8, 2021
Hardware sizing for Graph Database Neo4j Graph Platform operations	3	491	October 30, 2020
Do number of properties on a node impact query speeds even if the node is only queried for through a relationship with an indexed node Modeling cypher	1	468	October 15, 2020

Demystifying Neo4j UX Research

Property dense nodes

Related topics