Best practices when choosing relationship direction\name?

Hi all,

Sorry if this a total noob question, but I've been looking through posts and googling and haven't found anything that really answers it.

As a bit of background, I've spent over a decade modelling relational databases. I'm comfortable with the concepts of graph modelling (and actually pretty excited about the possibilities), and am currently trying to model a subset of tables in one of my SQL Server databases as a graph.

I've got no problem creating the entities as nodes, and am fine with creating the foreign keys as relationships between the nodes, but what I'm struggling with is defining the relationship itself with regard to direction and name. From looking at the examples in the documentation and elsewhere, it seems most common to start from the child\many side and end at the parent\one side, if one were to put it in more relational terms: e.g.,(:User)-[MEMBER_OF]->(:Group) and not (:Group)-[:HAS_MEMBER]->(:User). But this isn't always the case, as in the Northwind refactoring examples, where it's (:Employee)-[:SOLD]->(:Order) and not (:Order)-[:SOLD_BY]->[:Employee)

I know it's not super important, since it's my understanding that I can query based on incoming and outgoing relationships, and even ignore directionality when querying, but as someone who likes to know (and usually follow) rules, or even rules of thumb, I was wondering if anyone had any advice on how they usually tackle this.

Thanks,
Laura

2 Likes

I wonder the answer to this question.
How should we determine the relationship direction?

I don't think there's going to be any one right or wrong answer. I think you'll find the important part is just being consistent in which ever convention you choose. I would say you'll probably define things by the way you're looking for answers to questions. If you the questions you're seeking answers to typically start from the user, you'll probably model more like (:USER)-[:MEMBER_OF]->(:GROUP) but if you questions are typically are starting from the group, you'll model in the reverse (:GROUP)-[:HAS_MEMBER]->(:USER) .

There are models where you'll find the relationship will only make sense one way, boy loves girl but girl does not love boy. So there's a relationship in one direction but it's not reciprocated back the other direction.

Hope this helps and just my opinion on the matter.

Mike, There is a possibility that we can have both the relationships? If so, having multiple relationships impact the performance for the Graph DB?

Thanks,
Mahendar

Academically yes there's an impact on having a relationship going both ways because you're storing more data. But it's not nearly as detrimental storing excess data as it would be in an RDBMS on performance. Keep in mind, even though when you create a relationship it has be directional but when you query you don't have to specify a direction, you can leave it open-ended to any direction.

There's a really helpful video on youtube that explains the storage and query architecture of Neo4j: Secret Sauce of Neo4j: Modeling and Querying Graphs - YouTube Once you understand more of how things are working under the hood, you start to be able to better model your DB.

This would depend upon the kind of queries you need to make.

For example, if you had reciprocal :WORKS_WITH relationships, such that:

(sam:Person{name:'Sam'})-[:WORKS_WITH]->(chris:Person{name:'Chris'})
(chris)-[:WORKS_WITH]->(sam)

and every :WORKS_WITH relationship was reciprocal, while there wouldn't necessarily be an intrinsic cost (besides Mike's note on storage space), certain kinds of queries can run into trouble with such a model.

Consider:

MATCH (s:Person {name:'Sam'})-[:WORKS_WITH*4]-(someoneElse)
...

When there was only one :WORKS_WITH relationship between two people, we could be assured that we could never immediately backtrack to a previously visited person because once a relationship is traversed per path, it cannot be traversed again (though we could end up at the same person in the path through some more roundabout way), but because we have two :WORKS_WITH relationships per person, with just two hops we immediately can backtrack...one hope from sam to chris, next hop using the other relationship to jump back from chris to sam. This can mess up your expected results, and more importantly it could cost you performance-wise, as the number of possible paths when using variable length relationships can increase significantly with more relationships available to traverse when finding paths matching a pattern.

So our recommendation is, unless a relationship doesn't inherently imply reciprocation (such as :LIKES relationships, since it may be one-sided), it's usually best to avoid using reciprocal relationship pairs between nodes.

Thanks Mike. It helps.

Thanks Andrew. It helps.

I have a related relationship naming question...

Should relationship names be more generic so that they are shared by nodes of very different Labels or should they be specific between specific node types? (This is assuming that the number of relationship types don't exceed the maximum number of relationship types.)

E.g.

(c:Cow)-[:IS_A]->(m:Mammal)
(t:Toaster)-[:IS_A]->(a:Appliance)

vs.

(c:Cow)-[:IS_A_LIVING_TYPE]->(m:Mammal)
(t:Toaster)-[:IS_A_THING_TYPE]->(a:Appliance)

Or does it depend and under what circumstances? (I imagine having more relationship types could make things run faster.)

I think more specific Relationships could make the Cypher clearer that what types you intended to connect. On the other hand, it clutters up the code.

(Added)
I just found this, which confirms my guess that specific relationship types perform better:

TIA

1 Like

I have a question but not really specific and having said that i am still at the beginner stage of using graph in my machine learning journey. I hope my question somehow at the correct blog ! Does relationship affect some aspect of your ML ? i.e the way Node2vec construct it's embedding is similar to Word2vec idea of set of word to predict one and hece understand word meaning. To graph this traverse so if graph relation direction can affect traverse mechanism it could affect the specific features embedding could capture. What kind of advice to think of in terms of build this relationship or perhaps projection of your graph (To be able let you node embedding represent what it need to learn. More than a simple answer please ! i.e person -> buying feature could be learned by embedding person nodes by traversing his buying connection .. Please help me with sharing your knowledge in that aspect .. Thank you