Is it better to have many different relationship types or one relationship with properties?

awu · September 21, 2018, 4:51am

Hi - New to Graph and would like to learn more about modeling and design.

How would you best model an employee to company relationship, where you have a Company entity and a Person entity?

Would it be better to have

MATCH (n:Person)-[r:EMPLOYEE]->(m:Company) WHERE r.occupation = 'Janitor' RETURN n, r, m
or
MATCH (n:Person)-[r:JANITOR]->(m:Company) RETURN n, r, m

Is there a threshold for which there are too many relationship types between two nodes? Or is the database better optimized for relationships versus properties on relationships?

Thanks in advance for your help.

stefan.armbruster · September 21, 2018, 9:35am

In most cases having more specific relationship types is preferrable over using generic ones. However it's (in most cases) an antipattern to encode instance identifiers into a relationship type.

The reason for this is performance. In your example you need to iterate over all relationships and load the properties for each. This means 2 IO accesses for each. If you can be selective on relationship type instead of property, you only have one IO access.
On dense nodes it's even more of a difference since Neo4j maintains separate relationship chains for each relationship type.

The standard store format of neo4j allows for 65k different relationship types.

awu · September 21, 2018, 1:59pm

Thanks @stefan.armbruster for the quick response!

Sounds like a classic situation where I'd give up readability and design to gain some performance improvements. It makes sense from the IO access perspective. Whether it is a better practice to proliferate with multiple relationship types versus one relationship with multiple properties is still a bit murky, but I'll try out both.

This discussion brought up another idea though, whether having multiple Entity types would be beneficial. To wit,

MATCH (n:Person)-[r:JANITOR]->(m:Company) RETURN n,r,m
or
MATCH (n:Janitor)-[r:EMPLOYEE]->(m:Company) RETURN n,r,m

and I exclude
3) MATCH(n:Person)-[r:EMPLOYEE]->(m:Company) where n.occupation = 'Janitor' RETURN n,r,m for similar reasons as above.

How do most people design their graph databases when trading off against performance? Are the delays negligible initially so it's really a matter of developer's preference? How will they fare at scale?

Thanks again.

stefan.armbruster · September 21, 2018, 8:06pm

Classic consulting answer "it depends".
If you consider janitor being a subclass of person you might assign two labels to that node (p:Person:Janitor).
I assume in your case janitor is only a valid concept in the context of a company, so I'd go with alternative 1). But - as said - it depends on the domain and your understanding of it.

mike_r_black · September 24, 2018, 7:33pm

Another thing to also consider is what I call "Lazy Conversations". Take the email data model example that has been used many times as a graph example. We know we don't do: (user)-[emails]->(user) but that's actually a pitfall of lazy speech. We know it's a much more extensible model to do: (user)-[sends]->(email)-[to]->(user).

In your example, would occupation actually be another node: (user)-[has]->(occupation)-[employed at])->(company)? I would imagine a person could have more than one occupation/job role at a company or at multiple companies concurrently. Then it's just a matter of writing cypher optimized for the traversal to match the pattern of data you're looking for and you'll get the performance you expect from a graph db.

awu · September 24, 2018, 9:23pm

@mike_r_black - This is great. Thank you.

It seems as if there's another possibility of adding a new node.

Is
MATCH (o:occupation {type:"Janitor"})<-[:IS]-(p:Person)-[:EMPLOYEE_OF]->(m:Company)
any better than
MATCH(n:Person)-[r:EMPLOYEE]->(m:Company) where n.occupation = 'Janitor' ?

I do like how this allows for multiple roles/occupations as is mentioned and the cypher query is easier to understand.

elena · September 6, 2019, 1:40am

Hiya,

I know that this is a late response, but it's odd to me that this answer doesn't refer to this excellent piece of documentation:

Quote:

I ran a query against each database 100 times and then took the 50th, 75th and 99th percentiles (times are in ms):

Using a generic relationship type and then filtering by end node label
50%ile: 6.0    75%ile: 6.0    99%ile: 402.60999999999825
 
Using a generic relationship type and then filtering by relationship property
50%ile: 21.0   75%ile: 22.0   99%ile: 504.85999999999785
 
Using a generic relationship type and then filtering by end node label
50%ile: 4.0    75%ile: 4.0    99%ile: 145.65999999999931
 
Using a specific relationship type
50%ile: 0.0    75%ile: 1.0    99%ile: 25.749999999999872

My Summary:

Good: (25.7)
99%ile: 25.749999999999872 Total database accesses: 10,002
(:Person)-[ :HAS_EYES ]→(:Attr {colour:"blue"})

Worse: end node label (145.6)
99%ile: 145.65999999999931 Total database accesses: 70,001
(:Person)-[:HAS]→(:Attr :Eyes {colour:"blue"})

Pretty Bad: end node property (402.6)
99%ile: 402.60999999999825 Total database accesses: 140,001
(:Person)-[:HAS]→(:Attr {type:"eyes", colour:"blue"})

Very Bad: relationship property (504.8)
99%ile: 504.85999999999785 Total database accesses: 140,001
(:Person)-[:HAS {type:"eyes"} ]→(:Attr {colour:"blue"})

NB: I refer to this often as I have been learning, I think it's a terrific summary! Though it's relatively old, and I wonder if these algos have been updated. it'd be nice to rerun these some time, ping @mark.needham

mike_r_black · September 23, 2019, 2:46am

There's also this video that is an excellent watch that is a great explanation of how to model your graph to leverage the queries you'll be writing Secret Sauce of Neo4j: Modeling and Querying Graphs

dominicvivek06 · September 23, 2019, 6:29am

My blog on the performance

Kailash · January 6, 2020, 6:21pm

My view will be to go with the Relationship as type... a Data modeling and a execute plan will help though..

mithun.das · January 23, 2020, 9:38pm

It looks like
(:City)-[:TRANSLATION { code: 'lang.code'}]->(:CityTranslation) is the Very bad performer...
so
(:City)-[:TRANSLATION]->(:CityTranslation { code: lang.code} ) will perform better for me
is it?
I was thinking of using the lang.code as a dynamic relationship but then there can be too many languages... in our use case user inupts data with language code (we provide the list of language codes)

Topic		Replies	Views
RelationshipType OR Relation with property Cypher	3	834	February 13, 2020
Is it better for performance to have properties in relationship, or just add multiple nodes of the same name, but with different properties? Neo4j Graph Platform performance , cypher	7	2437	February 5, 2019
Which is better, more relationship types or less? Modeling cypher , relationship , knowledge-base	2	391	April 7, 2022
What is the optimal number of specific relationship types to use in Neo4j without negatively impacting performance? Neo4j Graph Platform	3	68	June 4, 2025
Should I use more specific or generic relation names in modeling? Neo4j Graph Platform	4	447	January 26, 2021

Demystifying Neo4j UX Research

Is it better to have many different relationship types or one relationship with properties?

Related topics