When a Cypher statement is first submitted Neo4j will attempt to determine if the query is in the plan cache before planning it.
By default Neo4j will keep 1000 query plans in cache based upon the conf/neo4j.conf
parameter of
dbms.query_cache_size.
In fact this actually represents 2 query plan caches.
- The string cache
When Cypher is initially submitted, the Cypher statement will have a hash computed on the string as-is. Using this resultant
hash value we will attempt to determine if statement already exists in the plan cache and if it does then re-planning may not
be necessary.
Note however that statements that are logically the same but differing in case will produce a different hash. The following
2 statement, though semantically equivalent, will produce a different hash and a replan may be necessary.
match (n) return count(n);
MATCH (n) return COUNT(n);
Additionally, statements that are logically the same but differing in whitespace/carriage returns will produce different hash values.
The following 3 statements will produce a different hash and a replan may be necessary
MATCH (n) return COUNT(n);
MATCH (n) return COUNT(n);
MATCH (n)
return COUNT(n);
Cypher statements prefaced by PROFILE/EXPLAIN
will have their PROFILE/EXPLAIN
removed before the statement is hashed. The following
2 statements will hash to the same value
MATCH (n) return COUNT(n);
PROFILE MATCH (n) return COUNT(n);
If the Cypher statements hash value is not found in the first cache Neo4j will then attempt to determine if it is in the 2nd cache.
- The AST cache
The Neo4j compiler parses the query from a string to an abstract syntax tree (AST), which is an object represenation of the query.
The optimizer then normalizes the statement so as to make planning easier. For example
match (n:Person {id:101}) return n;
will be normalized to
match (n:Person) where n.id={param1} return n; {param1: 101}
in this case Neo4j has moved the predicate {id:101}
from the MATCH
pattern to the WHERE
clause, and has
parameterized 101
value into a parameter, e.g. n.id={param1}
. Usage of parameters is further detailed
here
The AST doesn't store information such as white spaces and casing of keywords, and since it has been parameterized, literal values
can change but still produce the same AST.
This second query cache is keyed on this normalized AST. i.e. these queries will re-use the same query plan.
match (n:Person) where n.id=101 return n;
match (n:Person {id:101}) return n;
MATCH ( n:Person { id : 101 } )
RETURN n;
Finally should the Cypher statement be found in either the 1st or 2nd cache the query may still be subject to being replanned
based upon conf/neo4j.conf
parameters of
cypher.min_replan_interval
and cypher.statistics_divergence_threshold
cypher.min_replan_interval
is used to define the duration, defaulting at 10 seconds, a cached plan exists before it is eligible for replanning
cypher.statistics_divergence_threshold
is used to indicate what percent of the statistcs for the underlying data used by the Cypher has changed.
The default value us 0.75 which would indicate if the statistics in the object have changed by more than 75% since the last
time thhe cached plan was generated then a new plan would need to be generated.
For example running
<!-- // remove all :Person nodes -->
match (n:Person) detach delete n;
<!-- // create 10 :Person nodes -->
foreach (x in range (1,10) | create (n:Person {id:x}));
<!-- // list the 10 :Person nodes created -->
match (n:Person) return n.id order by n.id desc;
<!-- // create 8 new :Person nodes -->
foreach (x in range (11,18) | create (n:Person {id:x}));
<!-- // list the 18 :Person nodes -->
match (n:Person) return n.id order by n.id desc;
----
The 2 `match (n:Person) return n.id order by n.id desc;` would each be planned and specifically the 2nd instance although
having the same hash value, the statistics on :Person had changed from 10 nodes to 18 nodes and thus exceeding the 75% change.
If an existing plan needs to be replanned as a result of the above 2 parameters the `logs/debug.log` will log
2017-03-31 19:14:27.820+0000 INFO [o.n.c.i.ExecutionEngine] Discarded stale query from the query cache: match (n:Person)
return n.id order by n.id desc;
2017-03-31 19:14:27.821+0000 INFO [o.n.c.i.EnterpriseCompatibilityFactory] Discarded stale query from the query cache: match
(n:Person) return n.id order by n.id desc;
Additionally it should be noted that when a query plan is removed from the cache so as to make room for a new plan a least frequently
used (LFU) algorithm. So if the first query added to the plan cache is run every 1 second, and the 2nd query added to the query plan
cache is added every 2 minutes, then when we need to remove a query plan from the cache to make room for a new query, we will remove
the 2nd query before the 1st since the first is more frequently called upon.
Finally it should be noted that any schema changes, for example index/constraint creation/removal will flush the entire query plan
cache.