Is it better for performance to have properties in relationship, or just add multiple nodes of the same name, but with different properties?

performance
cypher

(Fady) #1

In terms of performance and good design, is it better to have a single node (e.g. person) connected to multiple other nodes (e.g. other people) with a relationship of same type but different properties, OR have multiple nodes of the same person, with different features connected to the multiple other nodes?


(Michael Hunger) #2

Usually single node per entity is the better design.

Sometimes you can model different "accounts" as separate nodes, it depends a bit on the use-case.


(Fady) #3

Hi Michael, thanks for responding. We'd like to use single node per entity too, but in order for us to do so, we'll need to add properties to the relationship rather than the node itself. Is that typically good practice? I'm attaching 2 scenarios here to convey my point in case I wasn't too clear.

15%20PM
You'll notice in the first picture that I have a unique node 'person3' with 2 relationships tracking that he was previously at house 2 and 3, and we track the 'startTime' and 'endTime' in the [:PREVIOUSLY_AT] relationship.

12%20PM
You'll notice in the second picture that I have 2 nodes for 'person3' with a relationship with each of the houses 2 and 3. With that design, we're able to keep 'startTime' and 'endTime' in each of the 'person3' nodes. However, this results in more than one node per entity.

Which one is recommended to optimize performance? I'm planning on designing it to handle real-time ingestion of data and it's expected to handles a lot of data.

Hope this makes the problem easier to visualize.


(Andrew Bowman) #4

Hello,

There's actually a third way that we would recommend. The concept of a :Residence (under this name or some similar name) seems to be important in your graph, signifying a person (or persons) staying at a location over a duration of time. This way a :Person can have several :Residences (with one being current), with a :Residence having a start and end date and a location of residence.

That lets you keep date information on :Residence nodes rather than on the relationship, and you won't have to duplicate your person nodes.


(Fady) #5

Hi Andrew, thank's a lot for the quick response! I just want to make sure I understand your recommendation correctly. You're saying that instead of having 2 types of relationships of

  1. :CURRENTLY_AT
  2. :PREVIOUSLY_AT

We should instead add a node and 2 relationships between each :person and :house, and call it :residence. This way it'll be something like this ->

(:person{name:'person1'})-[:IS]->(:residence{name:'currently', startTime:'Monday 7pm', endTime:'Thursday 9am'})-[:OF]->(:house{name:'house1'})
You mentioned something about location of residence, can you please elaborate on that? We need to have how the houses are connected to each other and in what order. Also, we're expecting to keep adding multiple people that visit a house at some point (they become currently_at that house), and at some point, they'll be at a different house (and therefore become currently_at the new house, and previously_at the old house, with timestamps for both houses).

Hope this makes sense and I really appreciate your help.


(Andrew Bowman) #6

The residence node looks about right. It's up to you on how specific you want the relationships to and from it. You can certainly have generic relationships like these, and it's up to you if you want the names to be in this format where it reads well in English (person is residence of house) or something else. It may also be worth having a :CURRENT_RESIDENCE (or some similar name) either in place of or in addition to the other relationships, allowing you a quick way to get the current residence of a person, or the current residence at a house, without having to match using the generic relationships and sorting/limiting to get the current one. You would have to maintain this, removing/adding the :CURRENT_RESIDENCE relationship as people move residences.

As for location, I was referring to your :house nodes. You can certainly add a spatial property value if that's useful, or connect them in some other way if it helps (a linked list between the houses, if they're next to each other on the same block?).


(Fady) #8

That makes a lot of sense, thanks Andrew. We're planning on maintaining the them by removing and adding the relationships as they move to different houses. With that in mind, is the reason you proposed this that it's better to have nodes with properties, instead of relationships with properties? The reason I'm asking is because we're expecting to have millions of new 'people' nodes be added to the graph DB. So we want to make sure the design we choose can scale when we have 10s of millions of people that were 'previously at' some houses.

As for location, we just need to know the flow, so I think it's good enough to have a linked list connecting to another node that's called city, then country, etc.


(Andrew Bowman) #9

It may come down to how you plan to look up things in the graph. If you plan to look up :Residence nodes directly (as opposed to matching first to a person or house first) then you'll likely need those nodes, as you can take advantage of an node index lookup.

If this is for traversal only, not as a starting spot, then relationships alone may work just fine. However if you have cases where you need additional nodes to participate, such as a residency consisting of multiple persons where you want to track this as a group rather than just individuals, then a node is the better model.