I do believe re-usability in a graph structure should promote uniqueness for attributes (at least, on a theoretical ground) but I'm really making my first steps with Neo4j.
So I was wondering if much more experienced developers could tell me about the pros and cons of using nodes instead of keys and putting the values (and eventual units) in the relation that ties them together ?
Hi Jonas,
interesting post and indeed very similar to my question from yesterday.
What do you mean by "modular", and why are you talking about classes?
In a labeled property graph, there are no classes (per se). There are nodes with one or more labels, and these labels correspond to classes just in one respect: They make it possible to refer to exactly the group of node with that label, as if they were "instances of a class", but the possible properties of those nodes don't depend in the least on the label. I think it's better to avoid talking about classes for that reason.
Technically it's perfectly valid to put values (and units) as properties on an edge that links to a node with the "meaning" of this pair. However, this way you get a lot of "super nodes" with lots of incoming links, like (:Attr {name:"temperature"}) that may be problematic in querying and visualization.
In "polygon", an ancient graph-based information system from the late 1990s, the modeling went like this:
(:FluidInContainer {name:"My cup of hot water"})-[:HAS_TEMPERATURE]->(:ValueNode {value:25,unit:"°C"})
This is something I had in mind in my other post. This has the charming side effect that you have a kind of reverse index right there inside the graph. Example:
match (p:Person)-[:HAS_GIVENNAME]->(v:ValueNode) where v.value="Christoph" return p
But frankly I myself am still waiting for expert input on the subject.
Where are the pro modelers?
best regards,
Christoph
Let's take a step by step approach to answering this question.
Properties on a Node can be indexed, whereas properties on a relationship cannot be indexed yet.
For eg: having dates on the relationship as property could help while filtering events.
Next, like @pingelsan said, having units of measurement like *C or *F as nodes could lead to extremely dense nodes.
MATCH (p:Person)-[:HAS_GIVENNAME]->(v:ValueNode) WHERE v.value="Christoph"
can be useful in the cases where some kind of entity resolution or de-deplication is taking place where a person with same given name and same SSN is creating multiple accounts or something like that.
MATCH (p:Person{name:"Christoph"}) -[:PURCHASED]-> (i:Item)
Here, having a relationship HAS_GIVENNAME in between Person and Item might not make sense since that is probably not the usecase we'll be looking at.
It will ultimately boil down to what you want to achieve with the data in your graph and the kind of queries you wish to run.