Hi,
To be specific let's talk about the following simple example. We want to use Neo4j to model a survey system. Each survey consists of a set of questions and each question has only 2 answers i.e.: Yes and No.
My idea is to model each question and each answer as a node and connect them with a relation e.g. HAS_ANSWER. Now I wonder if nodes representing answers should be shared/reused across surveys.
-
If they are shared/reused even if I have millions of surveys in a database I will still have only 2 nodes Yes and No what seems good. However, at the same time, these 2 nodes will have millions of incoming edges that may cause performance problems. If so, at some point I may need to split heavy nodes into smaller ones.
-
Another approach is to duplicate Yes/No answers across surveys. In this case, I will have millions of "duplicated" nodes in a database. On the other hand, I will not have very heavy nodes will millions of incoming edges.
What is important I anticipate that in my queries I will NOT make traversals from Yes /No nodes upward in the hierarchy i.e. from answers to questions.
How do you think which approach is better? Maybe there is some limit e.g. up to X millions of incoming relations go with solution 1 but if you have more go with solution 2.