Trying to understand some of the tradeoffs when refactoring duplicate data

sarah.n.golden · April 9, 2023, 10:25pm

Hi! I'm working my way through the courses, and I'm on the Refactoring Duplicate Data module in the Modeling Fundamentals course. I just learned about a great example where the languages in which a movie is available are listed as properties on the Movie nodes, but since most of them have "English" in the list, it's creating duplicate data. So this is refactored by making a Language node for "English" that can have a relationship with the Movie nodes. I understand the concept, but it brought two questions to mind:

In this case, we are trading a property of "English" for an IN_LANGUAGE relationship to "English". Why is this better? I know it is situational and depends on my use cases, but in general, are "duplicate" relationships better than duplicate properties?
Would this create the "super-node" issue cautioned against earlier in the courses? It would mean that we'd be creating a Language node for "English" and basically all the movies would point to it. Would that cause scalability problems? Or is it still better to do that than have "English" in the properties of basically every movie?

elaine_rosenber · April 10, 2023, 2:15pm

Hello Sara,

Welcome to the Neo4j Community!

It's always better to eliminate duplicate data as we teach in the course.

Since Neo4j does not yet support indexes on elements of a list, the best solution is to create an English node that the Movie node points to.

if your data is such that you may have "super" nodes, you may want to model the data to avoid the super nodes, but you need to have milions of relationships this to happen.

The bottom line is that you should profile your important queries to make sure that they are covered by your data model.

That's the beauty of Neo4j, it is easy to refactor the graph to support a data model.

Elaine

Topic		Replies	Views
Remove property duplicates Browser	2	335	March 25, 2020
Use cases for node properties vs relationship properties Newbie Questions performance	4	570	January 8, 2021
Multiple Properties vs Multiple Nodes Neo4j Graph Platform performance	8	2485	March 17, 2020
The English WordNet in Neo4j, problem when refactoring Linked Data, RDF, Ontology	1	426	February 3, 2021
Delete duplicate data and restore relationship Cypher cypher	2	1767	March 17, 2020

Get Certified in June!

Trying to understand some of the tradeoffs when refactoring duplicate data

Related topics