Advantage of Using Single label against multiple Labels

I am having a use case where we download files from different cloud vendors like Google drive, One drive, Box etc.
So each file can have label File, source(Dropbox or Box or OneDrive etc)
This way, If i want to query all Files irrespective of source, i can query using File Label.

Now i want to understand performance impact if i query using Individual label by listing all cloud sources like Dropbox, box etc instead of using File Label

If there is no difference, i can remove File Label right? Also i want to understand impact of adding a label regarding memory and performance aspect.

Can someone explain in detail its impact?

As I understand it, a Node Label is a set-like collection of all Nodes that share that label. (Actually pointers to the Nodes.) So when you do:

MATCH (n:MyLabel)
instead of
MATCH (n)
Cypher only has to do a linear search through the smaller subset of MyLabel vs. all Labels. The former is obviously quicker.

The overhead for keeping File label is that Neo4J has to keep an extra data structure (something like a Set) to track all the File Nodes.

Whether this makes sense for you or not, depends on your individual situation. Some things to consider:

  • Do you have other nodes that are not Files of some type?
  • Do you ever do MATCH on all Files? Or may need to?
  • Is memory (either disk or RAM) limiting factor? (I suspect that a Set of Nodes is probably not that expensive.)
  • Will or can your situation change so that you will be needing a File Label (node type) in the future?

If you do want match all files in the future and you don't have a File type, then you might have to do something like:

MATCH(f1:Dropbox)
MATCH(f2:Box)
MATCH(f3:OneDrive)
...

Instead of a simpler MATCH(f:File). The later should be faster.

And if should be required to support a new File type, then you probably will have to modify your Cypher code to deal with the File type (which is a pain and a potential source of errors because you might forget to fix some of your code.)

If you are really short of memory, then consider compressing the File names/paths.

I vote in keeping File type for now, as it will give you greater flexibility and make your code more robust to unforeseen changes. Maybe after a while when you discover you don't really need to have the File Label, you can then safely REMOVE it.

1 Like