Slow query with very limited data and boolean = false

matt.cloyd · January 5, 2021, 6:40pm

Hey all, I'm running up against an unexpectedly slow Cypher query, and I've looked into several other posts but I'm still not sure what to do about my graph to make it more performant.

In my query, I'm trying to find "unreserved" siblings of the origin node. Essentially, I want to walk up a :CHILD relationship to the (one) parent node, then find all of that parent's children that are not reserved.

My data represents the structure of a document, which has nested levels:

Title
- Subtitle
  - Chapter
    - Subchapter
      - Part (and so on)

The data model is very simple:

(:Level)-[:CHILD]->(:Level)

And here's the query that's giving me trouble:

MATCH (origin { type: 'part', identifier: '182' })<-[:CHILD]-
      (parent)-[:CHILD]->(siblings)
WHERE siblings.reserved = false
RETURN siblings

When siblings.reserved = false is in the query, it takes > 5 seconds, even though there are only 12 nodes that need to be checked for whether or not they are reserved. When set to true, or when the WHERE clause is removed, the query takes about 20ms.

My understanding from this other post is that the issue is having to scan all possible nodes to negate them.

There there are only 12 siblings that could match this relationship, given an origin node representing of Part 182. However, the query planner does the siblings.reserved part of the query first, so it's checking ~52,000 nodes.

My question, then, is: what do I need to do to bound the query such that the 12 possible siblings are the only ones who are checked for reserved = false?

And/or is there another way to make this lookup significantly more performant, with respect to the boolean attribute?

matt.cloyd · January 5, 2021, 7:08pm

Happy to report I may have solved the issue!

To limit the siblings that get searched for reserved = false, I set up a subquery in the CALL block.

CALL {
  MATCH (origin { type: 'part', identifier: '182' })<-[:CHILD]-(parent)-[:CHILD]->(siblings)
  RETURN siblings
}
WITH siblings
WHERE siblings.reserved = false

RETURN siblings

The query planner shows this searching about ~475 nodes for the boolean attribute, instead of ~50,000.

clem · January 5, 2021, 7:11pm

[added]. Your solution is almost certainly better than mine. Very interesting.

[original suggestion]
One educated guess/suggestion I have, is to change the state of the property .reserved

Instead of .reserved being true or false, make the range of values to be true and null (doesn't exist) for false

Then do the test siblings.reserved is null instead of siblings.reserved = false

My guess is that Neo4J needs to see if the property exists and if it exists, then extract the value if does (true or false). By making it null, then you only have to test if the property exists, which should be faster (but I'm not sure by how much.). You can also make the more likely property (true or false) to be the nullable value.

Topic		Replies	Views
New Ph D student exploring graph query optimization Introduce-Yourself	2	627	January 7, 2021
How can I optimize my Cypher Queries? Cypher optimization , performance , cypher , knowledge-base	5	9254	February 9, 2021
Slow Cypher Query Help Needed! Cypher performance	6	936	January 16, 2020
A slow running cypher query Cypher performance	8	5171	February 10, 2020
Using indexed nodes and simple queries my cypher queries are still taking around 500ms. Can this be further optimized? Looking for advice Cypher performance , cypher	5	420	December 18, 2020

Get Certified in June!

Slow query with very limited data and boolean = false

Related topics