Hi! I'm working my way through the courses, and I'm on the Refactoring Duplicate Data module in the Modeling Fundamentals course. I just learned about a great example where the languages in which a movie is available are listed as properties on the Movie nodes, but since most of them have "English" in the list, it's creating duplicate data. So this is refactored by making a Language node for "English" that can have a relationship with the Movie nodes. I understand the concept, but it brought two questions to mind:
-
In this case, we are trading a property of "English" for an IN_LANGUAGE relationship to "English". Why is this better? I know it is situational and depends on my use cases, but in general, are "duplicate" relationships better than duplicate properties?
-
Would this create the "super-node" issue cautioned against earlier in the courses? It would mean that we'd be creating a Language node for "English" and basically all the movies would point to it. Would that cause scalability problems? Or is it still better to do that than have "English" in the properties of basically every movie?