What is the impact, if any, of creating indices on a labeled node that are unused in many instances of that node? More specifically, what is the impact on write performance, read performance, and space?
I'm using neo4j-enterprise v4.4.4.
I have a labeled node (:AnalysisResult
) whose instances are created by several different analyzers. Each analyzer is represented by a different labeled node (:Analyzer
). My current design adds an index to :AnalysisResult
for each property created by a specific analyzer.
There are several million instances of AnalyisResult in the database. Any one Analysis Result instance has only a few properties -- the others are unused. There are several dozen indices defined on AnalysisResult, and each instance populates only a handful (less than six) of these.
When I do read queries, I use labeled relationships in a cypher query to collect the AnalysisResult
instances for a given analyzer:
MATCH (analysisResult:AnalysisResult)-[ANALYZER]->(analyzer:Analyzer {analyzerID: 'aUniqueID'})
RETURN analysisResult, analyzer
The instances of AnalysisResult
returned by this query do not have property values for the properties that are unused by the Analyzer
that created them.
Is it worth refactoring the database to reduce the number of unused properties? For example, is it worth adding multiple labels, so that instead of :AnalysisResult
, I instead create :AnalysisResult:HotSpot
?
I ask because I'm about to introduce another Analyzer that will create several new properties on the AnalysisResult nodes that it creates.
This is perhaps an object modeling question, but my immediate concern is whether the current approach I'm using is good enough.
The result is that I'll have several million instances of AnalysisResult that will have only 2-3 properties populated, and the other indices defined on AnalysisResult will be empty.
Please feel free to suggest whether this question belongs in a different discussion group.