Performance of a single global ID versus IDs per groups of labels

Hello,

I apologize in advance if this has been asked already.

I have a dataset which has different entities and sub-entities.
Conceptually, I have for example the following node labels:

User
ForumUser
EmailUser
SubForum1User
SubForum2User

Now, all nodes will have the label User.
All nodes with label SubForumXUser will have labels ForumUser and User.

Would there be any disadvantage in keeping a global my_id field on all nodes that have label Userand use that for indexing, as opposed to using a different id field for each label type?

There are queries I will want to perform to retrieve a subset of nodes that have label ForumUser.
And for other queries, other subsets of other labels.

From my understanding, specifying the label in the query will already reduce the range of nodes to be considered because the label defines an anchoring point.

But for the search within a specific label, would I have a performance penalty for having a global my_id for all nodes and indexing it with label User?
I ask this because such a solution would allow me to express queries more simply.

Or would I get better performance by using a label_X_id for indexing each label X at the expense of having more indices?

Thank you for your time.

Hello @mcbr and welcome to the Neo4j community :slight_smile:

  • You can use your global ID and put a UNIQUE CONSTRAINT on it.
  • You don't need to put a group ID for each label, indeed, you can use the Label if you want to do some restrictions.
  • After, if you want to search for something, for example a text or another propery, you can use search index.

Regards,
Cobra

Thanks for the information.

So for example, lets imagine I go with a global id called my_id for all entities (they all have the label User) and created a UNIQUE CONSTRAINT like this:

CREATE CONSTRAINT global_id_constraint
ON (user:User) ASSERT user.my_id IS UNIQUE

This way I have a global ID field for users, no matter the labels they have.
From what I understand, this constraint also creates a single-property index for my_id.

So now for example, if I create a full text index over the field full_name that all nodes with label User have, I could do:

CALL db.index.fulltext.createNodeIndex("nameTextIndex",["User"],["full_name"])

If I wanted to do a text query only on nodes that have label SubForum1User, would it be efficient with that nameTextIndex which was only defined for label User and the single global ID my_id?

Would the query planner be smart enough to still use the nameTextIndex efficiently even though we are anchoring to another label SubForum1User (remember that the nodes of SubForum1User also have label User)?

Thanks again.

This way I have a global ID field for users, no matter the labels they have.
From what I understand, this constraint also creates a single-property index for my_id .

Yes, you have a global ID for all users but you must specify the property that must be unique, it can be an id or something else, but this id must be already in your data. And of course, it will be a property so you will can still use it to do what you want.

If you look here, you can define a global search index (you put all your labels for example) and after when you call it, you only specify the label and the property you need to search on :slight_smile: