Help needed with replan of queries and divergence threshold

Using Neo4j 4.4 Enterprise
Could somebody please explain in more detail the use of

  • cypher.min_replan_interval

  • cypher.statistics_divergence_threshold

I have read the documentation but have a few questions. We are loading large numbers of records via script that uses an API to a Java application that builds Cypher queries to insert/update data. I know we could potentially use an import tool, but this is what we have for now. Over time during the data load performance degrades and there are quite a few messages in the log like

Discarded stale query from the query cache after 28764 seconds. Reason: CardinalityByLabelsAndRelationshipType(None,Some(RelTypeId(0)),None) changed from 1473153.0 to 1636161.0, which is a divergence of 0.09962833730910345 which is greater than threshold 0.08907565384229182. Query id: 5409299

Settings for the two values are at their defaults
I see the threshold go up and down in these messages, what causes it to go up? I thought it would always gradually reduce over time?
What are the implications of setting cypher.min_replan_interval to a very large value just during this data loading phase?

The documentation states the threshold will reduce to 10% over 7 hours - but how is this the case if the replan interval is many hours?

Thanks for any advice

@swellard

Neo4j 4.4.x? Do you know what patch release, i.e. what x

the mssg

Discarded stale query from the query cache after 28764 seconds. Reason: CardinalityByLabelsAndRelationshipType(None,Some(RelTypeId(0)),None) changed from 1473153.0 to 1636161.0, which is a divergence of 0.09962833730910345 which is greater than threshold 0.08907565384229182. Query id: 5409299

simply indicates the current query plan that is in the query plan cache is determined to be replanned given the change in data. And this message is indicative of significant data 'movement'. But planning is generally a quick operation, i.e. < 1s etc. So even if we eliminated this mssg and did not replan presumably you only saving < 1sec if that. So just not really sure how a replan event would be contributing to a significant slow down.
I might be more so concerned to make sure the queries are optimal and have supporting indexes.

For example a

merge (n:Person {name:'Dana'}) set n.status='active' return n;

and if there are currently 100 million :Person nodes, the above MERGE could take many seconds with no supporting index, but with an index would take < 1s etc.
So even if we replanned this every time then with no index it is many seconds + replan time vs. with an index it would take 1s + replan time

Thank you for the reply
We are using Enterprise version 4.4.18
We have seen the opposite when it comes to planning with some of our queries. We have some that have very complex projections in the return clause - we have seen 30s + for some of our queries to get data plan, hence the thought that maybe re-planning was slowing things down.
We do have many indexes, are you suggesting that going through the query and looking for the opportunity to create more indexes would speed the planning process even on those with complicated projections?

Although the script is doing PUT requests to our API this will involve several queries before a record is created, for example to validate data against existing and to get nodes to relate to. I realise it is hard to make recommendations without knowing the queries or schema but is lack of good indexes the biggest factor at play in slow planning?
Many thanks

@swellard

Given Enterprise and if you have a Enterprise license then this may be better served via support.neo4j.com