I am having a use case where we download files from different cloud vendors like Google drive, One drive, Box etc.
So each file can have label File, source(Dropbox or Box or OneDrive etc)
This way, If i want to query all Files irrespective of source, i can query using File Label.
Now i want to understand performance impact if i query using Individual label by listing all cloud sources like Dropbox, box etc instead of using File Label
If there is no difference, i can remove File Label right? Also i want to understand impact of adding a label regarding memory and performance aspect.
Can someone explain in detail its impact?
As I understand it, a Node Label is a set-like collection of all Nodes that share that label. (Actually pointers to the Nodes.) So when you do:
Cypher only has to do a linear search through the smaller subset of
MyLabel vs. all
Labels. The former is obviously quicker.
The overhead for keeping
File label is that Neo4J has to keep an extra data structure (something like a Set) to track all the
Whether this makes sense for you or not, depends on your individual situation. Some things to consider:
Files of some type?
Files? Or may need to?
FileLabel (node type) in the future?
If you do want match all files in the future and you don't have a File type, then you might have to do something like:
MATCH(f1:Dropbox) MATCH(f2:Box) MATCH(f3:OneDrive) ...
Instead of a simpler
MATCH(f:File). The later should be faster.
And if should be required to support a new
File type, then you probably will have to modify your Cypher code to deal with the File type (which is a pain and a potential source of errors because you might forget to fix some of your code.)
If you are really short of memory, then consider compressing the File names/paths.
I vote in keeping
File type for now, as it will give you greater flexibility and make your code more robust to unforeseen changes. Maybe after a while when you discover you don't really need to have the
File Label, you can then safely