cancel
Showing results for 
Search instead for 
Did you mean: 

How to fix the un-optimal behavior of query engine

michalkomorowsk
Node Link

Context:

  • We have users.
  • Users can be members of groups.
  • Groups belong to tenants.
  • Tenants = business context.
  • Tenants form a hierarchy.

Query

Now we have the following simplified query:

PROFILE MATCH
(u:User {Id: '<USER_ID>'})-[:MEMBER_OF]->(g:Group),
(t:Tenant)-[:CHILD_OF*0..5]->(root:Tenant)<-[:BELONGS]-(g)
RETURN t.Id, g.Id

So we want to retrieve all the groups for the given user. However, groups can be inherited and it is why we traverse a hierarchy of tenants.

Issue

The query returns correct results but it is slow.I profiled the query and I believe that the problem is here:

2X_5_56d1cce3f57300d53381da2f1349c462d2ad2a7f.png

The engine firstly finds all possible combinations of users and groups what gives around 2 mln records and from all these records it selects only 5K for a given user.

From my perspective, it is not optimal. It would be much faster to start from a given user and find groups for it.

Do you have any suggestions? So far I tried to play with hints, different structure of query but all the time the query plan is the same.

1 ACCEPTED SOLUTION

michalkomorowsk
Node Link

I found out that described problem can be fixed by rewriting the query in the following way:

MATCH (u:Usert {Id: '<USER_ID>'})-[:MEMBER_OF]->(g1:Group)
WITH COLLECT(g1.Id) AS groups
MATCH(t:Tenant)-[:CHILD_OF*0..5]->(root:Tenant)<-[:BELONGS]-(g2)
WHERE g2.Id IN groups
RETURN t.Id, g2.Id

The main changes are:

  1. In line 2 I eagerly find ids of groups that I'm interested in.
  2. In line 4 I use found groups to filter results.

In this case, the engine does not try to generate all possible combinations of groups and users.

View solution in original post

4 REPLIES 4

clem
Graph Steward

Check this out. You can force an early match that filters out the most irrelevant results, then follow up with matches that are broader.

clem
Graph Steward

Also look here:

dominicvivek06
Graph Fellow

My first opinion - (good news)
Just looking at the two steps of the profile - i see that there is a huge pagecache hits and 0 misses, which means the page cache is used to cache the data and avoiding accessing disk which will typically make the query run faster.

If can you post the full query plan, the count of the nodes, your infrastructure settings, and neo4j version (enterprise / community) that would be helpful for me and others in the community to provide more insights.

michalkomorowsk
Node Link

I found out that described problem can be fixed by rewriting the query in the following way:

MATCH (u:Usert {Id: '<USER_ID>'})-[:MEMBER_OF]->(g1:Group)
WITH COLLECT(g1.Id) AS groups
MATCH(t:Tenant)-[:CHILD_OF*0..5]->(root:Tenant)<-[:BELONGS]-(g2)
WHERE g2.Id IN groups
RETURN t.Id, g2.Id

The main changes are:

  1. In line 2 I eagerly find ids of groups that I'm interested in.
  2. In line 4 I use found groups to filter results.

In this case, the engine does not try to generate all possible combinations of groups and users.