Optimise query for matching 2 labels with ordering

Hello,

I have a query that matches two labels and returns them sorted by name, but it can be slow for some accounts.
We are using Neo4j Aura and Neo4j version is 4.3.3

This is some sample of database structure:

(:User { firstName: "Name", "lastName": "Surname" })-[:SHARED_WORKSPACE]->(:Workspace)
(:Workspace)-[:HAS_CONTACT]->(:Contact { name: "Name Surname" })

The goal is to return both :User and :Contact labels sorted ASC by name

Current query (simplified):

MATCH (workspace:Workspace { id: "uuid" })
MATCH (contact)
WHERE (contact:User OR contact:Contact)
AND (
   EXISTS ((contact)-[:SHARED_WORKSPACE]->(workspace))
   OR EXISTS ((workspace)-[:HAS_CONTACT]->(contact))
)  
AND (contact.status = "CONFIRMED" OR contact.status IS NULL)
MATCH (creator:User { id: contact.created_by })
RETURN DISTINCT contact, workspace, creator
ORDER BY TOLOWER(COALESCE(contact.firstName + ' ' + contact.lastName, contact.name)) ASC, contact.id
SKIP 0
LIMIT 100

So for some workspaces that query can take between 3-5 seconds, which is too long. The goal is to make execution time less than 2 seconds.

When I profile, I can see that it starts from NodeByLabelScan on the whole database, while we only care about the labels that matches the relationships with matched workspace

I've tried different ways to change the query so it matches both labels with relationships, for example:

MATCH (workspace:Workspace { id: "uuid" })
OPTIONAL MATCH (contact)-[:SHARED_WORKSPACE]->(workspace) WHERE (contact:User OR contact:Contact)
OPTIONAL MATCH (workspace)-[:HAS_CONTACT]->(contact) WHERE (contact:User OR contact:Contact)
....

But it this case it will only match single label, depending which OPTIONAL MATCH will be first, either :User or :Contact.

Also it seems that ordering with COALESCE is not very productive:
ORDER BY TOLOWER(COALESCE(contact.firstName + ' ' + contact.lastName, contact.name)) ASC, contact.id
Unfortunately can't change the date, and User and Contacts has a bit different properties, so doing this to order by name.

Any suggestions?
Thanks

There are a few things about this query that trouble me.

First, you match to get a specific Workspace node, then you go search the universe of User and Contacts nodes that have relationships to the workspace node you already find. You should not do this, as you can limit your search to nodes related to your workspace node. There is no need to search among all the other User and Contact nodes in your database.

Second, you seem to be treating User and Contact nodes as if they are the same, as you are blending them in one query and returning a list where each row can have either type of node. Based on your issue with the coalesce, they don't even have the same name properties. You should not mix these and try to query for both at the same time. Instead, query for each specifically.

Third, you get a list of Contact nodes from your second match statement, then you use a property in each to find an associated User name, which is the creator. You should use relationships to relate entities instead of primary and foreign keys, as in a relational database. Also, you are blending User and Contact nodes here, as each has a created_by date that relates it to a User.

I don't know your data model, so I am not stating this is correct. but you should have a model like this that related entities via relationships. It is then easy to query for everything related to a workspace node.

In the meantime, this may help with your execution time. I can't do anything with your "join" match. You need to replace this with a relationship.

MATCH (workspace:Workspace { id: "uuid" })
WITH workspace,
[(contact:User)-[:SHARED_WORKSPACE]->(workspace) where contact.status = "CONFIRMED" OR contact.status IS NULL | contact] +
[(workspace)-[:HAS_CONTACT]->(contact:Contact) where contact.status = "CONFIRMED" OR contact.status IS NULL | contact] as contacts
UNWIND contacts as contact
MATCH (creator:User { id: contact.created_by })
RETURN DISTINCT contact, workspace, creator
ORDER BY TOLOWER(COALESCE(contact.firstName + ' ' + contact.lastName, contact.name)) ASC, contact.id
SKIP 0
LIMIT 100

Hi Gary,

Thanks! Your solution with joining contacts in list helped a lot. We didn't use such approach before.

That's a large production system, so unfortunately can't change data and yes, you are absolutely correct that User and Contact nodes are different and should not be returned together.
And in fact they are not used anywhere together in the web app, except one case there it was a business need to return a single list of both users and contacts (paginated).
That's already in production, so currently I was just looking into improving query performance and API response time.

Thanks again!

Understood. We got what we got. For the second match, do you have an index on ‘id’ property for User nodes?

What is not working with the ‘order by’ so we can try fixing that?

Hello again,

Everything is working fine now. I've noticed some slowness before for

COALESCE(contact.firstName + ' ' + contact.lastName, contact.name))

but that seems to be related to large dataset, because now after making suggested changes, the query execution time decreased significantly and no further improvements needed.

Thanks

1 Like