For our custom cypher procedures we would have a significant boon in performance if the order of Labels on Nodes would be guaranteed. For example, lets say we have the following labels:
Animal, Cat, Dog
We would want the most specific label to occur first for a Dog:
Dog, Animal
And same for a Cat:
Cat, Animal
We have found already that we could be making dummy nodes in the database, as it seems like the order of the labels is determined by the order at which the labels are created. But we are wondering:
- Is this the correct way to do this? It seems like a hack, so to say or,
- Is there a more efficient way to do this without creating dummy nodes.
Cheers,
Boyen van Gorp
There isn't a way to guarantee ordering on labels. Labels aren't really an array (arrays can be ordered) it's better to think of them as a set of labels, and sets don't really have an implicit order. Labels are either present or absent (set membership).
By making the most specific label "appear first" it seems more like your requirement is to display the data a certain way. When you say appear first, do you mean in the context of Bloom, Browser, or something else?
If I had a particular display requirement, a thing I might consider doing is storing a property along with the node that describes how you want the node displayed, and then use the built-in features of Bloom, Browser, or other custom software to display the nodes driven by that property, not by actually changing the labels themselves, or using any dummy nodes.
Just as food for thought too, it's sometimes a tricky proposition to use labels as a way of modeling subclass/superclass relationships, as you have provided with Animal -> Cat and Animal -> Dog. For an extended treatment about labels and why class hierarchies can be problematic, you could take a look at this: Graph Modeling: Labels. What are labels for, and how can you… | by David Allen | Neo4j Developer Blog | Medium
We are using the labels in a neo4j plugin where we are writing our own custom procedures similar to apoc.
If we can get the most important labels to appear first all the time it would allow us to optimize the performance. As it would mean that you don't need to travel the array of labels. With big numbers this could be significant.
But there is something to be said for just decreasing the labels. Even though the animal doesn't seem orthogonal, in our case it is because we also have virtual animal that can also be cats or dogs.
It did seem like we could get this to work by first creating a dummy node with the dog/cat label and then a dummy node with the animal label. All following nodes would have the labels ordered by the order at which the dummy labels were created.
Is the above not intended behavior? And with that I mean, would it be bad practice to program against it?
Ordering on labels isn't really intended one way or the other, it's more by chance, or an underlying implementation detail of the database that you can't really rely upon.
Some things are explicit in cypher, and others things aren't. For example, if you say this:
MATCH (p:Person) return p.name
Is the ordering of the results guaranteed? No, because you didn't specify an order. Now, from run to run, it might be consistent -- but that's just a detail you shouldn't rely upon. If you use an ORDER BY clause, it is reliable, consistent, and stable. Node labels are like that. You can experiment and see if you can get the system to return them in a particular order, but Neo4j basically just isn't promising that this is an ordered list (it's more like a set) - and so if you come to rely on that ordering without being explicit about it, there isn't any guarantee it'll remain stable.
David, thanks for all the help, we've decided to do as you suggested and not rely on this mechanic.