No. of labels one to infinity is possible - but best practice: not more than 4. Why?

In Neo4j, a node can have zero, one, or many labels.

But best practice is "Not more than 4"

How comes? And how do you arrive at this specific number?

Hi,

I went back to team to ask about this.

The original advice re 4 labels stems from:

Internally in the node store a node can fit about 3 to 5 labels before going into the label overflow store. The idea was to avoid the overflow which results in using a few more memory pointers to get that information during traversal. When thinking about how many labels is too many, the labels are used to optimize traversal. It does this by using the label token lookup index or by user creating indexes that use the label, or in filtering during navigation. In general, if the label is used to optimize performance, then having higher numbers of labels beyond 4 is justified. The improved performance in reading less of the graph out weights the cost accessing the labels.

From my perspective another angle is probably that if you are adding lots of labels it may be time to think about your data model, but, this feels like a minor point.

It feels like describing "not more than 4" as "best practice" could give the wrong impression, so I am going to change the course content.

Martin

1 Like

Hi Martin,

thank you!
"from my perspective another angle is probably that if you are adding lots of labels it may be time to think about your data model" --> I agree.

I do have some more "problematic" questions:

  • What's the best way to depict time = How many relationships "Worked here from 20xx to 20xx" or "Payment" are "ok", and when is it better to switch from additional relationships between 2 nodes to refactoring the model?
  • How many properties are "best" in a Node, and at which number is it better to create them as Nodes of their own?

I ask only for "ballpark". I have no experience. It it is not possible ("it depends...") I understand.

Bye

Michael

I am going to do with "it depends"!.

Some thoughts based on use case - but they are nothing more than that and I would expect others to have different opinions.

What's the best way to depict time = How many relationships "Worked here from 20xx to 20xx" or "Payment" are "ok", and when is it better to switch from additional relationships between 2 nodes to refactoring the model?

I dont think there is anything wrong with multiple relationships between nodes, particularly if youre looking to understand or traverse those relationships.

Employing dateFrom, dateTo as properties on relationships is typically and a useful way to understand relationships.

How many properties are "best" in a Node, and at which number is it better to create them as Nodes of their own?

I wouldnt want to impose an artificial limit on the number of properties in a node... an indicator for me that I should start to create new nodes and relationship is when I started to see duplicate data and new entities.

My overriding advice would be to remember that your model can change and adapt (easily). Build for day 1 and expect to iterate. Simple is usually best, particularly when starting.

1 Like