What is the better data model-- creating more nodes, or utilizing more properties?

henry007 · May 15, 2023, 4:59pm

Hello, looking to build out a data model in Neo4j and looking for what would be the better data model. See the attached image--

In Option 1, Person is connected to two EmailAddress nodes (as they have two emails), and three Address nodes, each with distinct relationships based on whether its a billing, mailing, or living address.

In Option 2, all of the data is instead properties on the Person node. The emails they have are stored as a list.

Multiple Person(s) may share data (for example, if two people have Boston as their living city. In Option 1, two people would have LIVES_AT connections to the Boston node. In Option 2, each Person would just have the property living_city set to "Boston").

We are interested in running some of the graph algorithms on our data and I understand that Option 1 would be more suited for a use case like that. But would it make more sense to use properties with Option 2 to not clutter our model with too many nodes?

Thank you!

glilienfield · May 15, 2023, 11:34pm

You described what I was going to tell you. Graph databases are for analyzing relationships between entities. It is especially powerful with a network of connected data. Analyzing such relationships with a relational database is impractical.

You should use nodes for information that can be used to relate multiple entities. Properties are good for metadata.

Sometimes I feel people want to use a graph database because is new and in, but can do just fine with a relational database if you have parent and child tables.

ameyasoft · May 16, 2023, 9:28pm

I like Option 1 as it presents a good pictorial description of the issues. It's like seeing is believing!

mbandor · May 17, 2023, 1:34pm

Technically you could implement both. If you are looking for people living in a specific location, then having the location as a node would process faster (more efficient) as each node wouldn't have to be queried for the property. I would not include the specific address in the location node as that is specific to the person, not the location. It could also be used as a property in the LIVES_AT relationship.

james2 · May 20, 2023, 12:45pm

Nodes are the smarter choice. Don't worry about cluttering the db with too many nodes; Neo4j is designed to efficiently handle a very large number of them.

If your model is purely a list of Person nodes, like an address book, then it doesn't make much difference. There would also be no reason to prefer a graph db over a relational one, either.

However, if your model includes interactions between people and/or organisations, and email is involved, there's additional information available here:

The fact that the communication was by email, not by post.
Which address was involved.

The same thing goes for the physical addresses, and this is where graph databases start to shine.

When the model is small and simple, this might look like hair-splitting. However, as you accumulate a large number of entities, and a large number of interactions between them, there's an increasing amount of information in these distinctions. One of the terms for this kind of thing is "traffic analysis" - discovering relationships between people/organisations, and the nature of those relationships, just by looking at who communicated with whom, by what address, and when, without ever needing to know the content of the communications.

You can always start with one of these approaches and switch to the other later, but this additional information is only available if you use nodes to represent addresses. If you start with nodes and later discover that you really don't need that extra information, you can collapse the address nodes into attributes in the Person nodes, and throw it away. However, if you start with attributes, you're throwing that information away at the start, and you can only add it in later by going back to the original source data.

oruckenanyildirim · November 10, 2024, 8:49pm

I would rather choose both too,
You may choose option 1 for sharable data : many people may live in Boston and on street S
Choose option 2 for private to person data: the person house is X

Too many nodes: never too many if you are after an info which has to be discovered or to be created from zero.

Topic		Replies	Views
Neo4j Relation vs Properties Newbie Questions	1	424	August 1, 2020
Is it better for performance to have properties in relationship, or just add multiple nodes of the same name, but with different properties? Neo4j Graph Platform performance , cypher	7	2419	February 5, 2019
Use cases for node properties vs relationship properties Newbie Questions performance	4	575	January 8, 2021
Graph Data Modeling Question Modeling performance , neo4j-desktop , modeling , data-modeling	12	1241	May 4, 2021
Property dense nodes Neo4j Graph Platform	3	1451	November 21, 2018

July Summer Fun!

What is the better data model-- creating more nodes, or utilizing more properties?

Related topics