An effort to better understand the underlying mechanism of query filters and profiling

Hello,

I always find myself confused when profiling queries and find that at some point the number of DB hits increase somewhere along the execution path.

Take into consideration the following query:

MATCH (s:AudienceSegment {checksum: $checksum})<-[r:BELONGS_TO]-(t:Talent:Instagram)
WHERE t.iscore IS NOT NULL
		AND t.iscore >= $iScoreFrom
		AND t.iscore <= $iScoreTo
		AND t.followers >= $followersFrom
		AND t.followers <= $followersTo
		AND t.avg_video_views >= $videoViewsFrom
		AND t.avg_video_views <= $videoViewsTo
		AND t.avg_engagement >= $avgEngagementFrom
		AND t.avg_engagement <= $avgEngagementTo
		AND r.affinity > 0 // <----
RETURN t AS talent, r AS belongsTo
ORDER BY r.affinity DESC
SKIP $offset LIMIT $limit;

Here is a screenshot of the query PROFILE

  1. I would like to understand how come after we've matched t based on the above query params we still have s and/or r that spans beyond the matched t pool?
  2. What is the correlation between DB hits and rows? if there is any.

I think of a filter statement to have exclusive nature, in a sense that, once we've matched an edge of a relation, only the corresponding "neighbours" would be available for the rest of the query. Think of it as a funnel, the more we advance into the execution, in this case, the more pruning should be done along the way.

But as we can observe when we reach the AND r.affinity > 0 filter the pool of ts increase drastically.
I'd like to add that this query is running on a large data set and it takes anywhere between 40s to 1min to execute. We suspect that the query's poor performance is specifically due to AND r.affinity > 0.

Thoughts?

Thanks in advance.

1 Like

Hi, perhaps this explanation helps.

Since you have an index on AudienceSegment.checksum, the "anchor" of your query, or the node that is found first is s.
From there, it expands out through the BELONGS_TO relationship type to find the ts on the other end. This is the first Expand (All) you see in the plan.
There's a filter on the relationship r and this checking incurs db hits as each relationship r has to be checked for the property affinity being greater than 0. At the end of the filter step, you see the number of rows has indeed reduced.
From that point on, there is no further expansion and just projections and sorting required for your results.

Depends on the goal of your query and the graph model, but the more BELONGS_TO relationships that exist between any given pair of nodes, the more your query is going to cost because of the relationship property check.

1 Like

Hi Luanne,

Thanks for your prompt reply and explanation.
I would like to emphasis that we are also filtering ts, I expect it to reduce the number of matched rs.
Do you mean that the relation filter takes precedence over the nodes filter on the other end?

To avoid this kind of expansion I was considering splitting the WHERE clause into two and pipe results from one to another, something of the sort:

MATCH (s:AudienceSegment {checksum: $checksum})<-[r:BELONGS_TO]-(t:Talent:Instagram)
WHERE t.iscore IS NOT NULL
		AND t.iscore >= $iScoreFrom
		AND t.iscore <= $iScoreTo
		AND t.followers >= $followersFrom
		AND t.followers <= $followersTo
		AND t.avg_video_views >= $videoViewsFrom
		AND t.avg_video_views <= $videoViewsTo
		AND t.avg_engagement >= $avgEngagementFrom
		AND t.avg_engagement <= $avgEngagementTo
WITH s, r, t
WHERE r.affinity > 0 // <----
RETURN t AS talent, r AS belongsTo
ORDER BY r.affinity DESC
SKIP $offset LIMIT $limit;

But this had no effect on query performance.

Does this split make any sense in your opinion?
Thanks again.

No, moving the r.affinity filter to after the WITH will not make a difference because the relationships have already been expanded into. If you know for a fact that it would be cheaper to expand from t to s then perhaps you can play with some hints https://neo4j.com/docs/cypher-manual/current/query-tuning/using/ ?

1 Like

Hi again,

Thanks Luanne for the follow up.

I've been experimenting a few but I am not confident I am heading in the right path, so i'd like to add some more context to our use case and share what i've been trying so far.

Context

We have a fairly average size data set, ~700k nodes and ~100M relations which we are looking forward to grow by two or three folds by end of year.
We query this data set in a real-time setting, we can tolerate no more than 5s response time.

  1. The count of s nodes is 30 times more than that of t nodes.
  2. We cannot index hint t nodes since these can only be filtered by the query params provided above.
  3. Our current setup is as follows:
    • Driver: using official Golang driver
    • Neo4j causal cluster of 3 core and 2 read replicas. The cluster is managed by Kubernetes, so Neo4j is hosted within Docker containers and not on the machine directly
      • Each core has a maximum of 3CPU units and 20Gi RAM with configuration of heap: heap.max_size and heap.initial_size are 13Gi
      • Read replicas are configured with a maximum of 3CPU units and 18Gi RAM (read replicas are irrelevant to this question)
  4. All our indexes are native.

Experimentation

Avoid relation property query

We removed the infamous r.affinity > 0 where clause in favour of an additional property filter t.is_identification_enabled = true on the t node itself, leveraging the following query:

MATCH (s:AudienceSegment {checksum: $checksum})<-[r:BELONGS_TO]-(t:Talent:Instagram)
WHERE t.iscore IS NOT NULL
		AND t.iscore >= $iScoreFrom
		AND t.iscore <= $iScoreTo
		AND t.followers >= $followersFrom
		AND t.followers <= $followersTo
		AND t.avg_video_views >= $videoViewsFrom
		AND t.avg_video_views <= $videoViewsTo
		AND t.avg_engagement >= $avgEngagementFrom
		AND t.avg_engagement <= $avgEngagementTo
		AND t.is_identification_enabled = true	
RETURN t AS talent, r AS belongsTo
ORDER BY t.iscore ASC
SKIP $offset LIMIT $limit;

With no difference whatsoever in query performance. Got to mention that we do allow to ORDER BY r.affinity.
Results: Cypher version: CYPHER 3.5, planner: COST, runtime: SLOTTED. 131321 total db hits in 8330 ms. Profile

Add JOIN hint

The only hint I think we can provide to the query planner is a Join hint, since we cannot hint any index on t nodes for the aforementioned reasons. Nonetheless I did try other hints mainly, SCAN hints. Leading to the following query:

MATCH (s:AudienceSegment {checksum: $checksum})<-[r:BELONGS_TO]-(t:Talent:Instagram)
USING JOIN ON t
WHERE t.iscore IS NOT NULL
		AND t.iscore >= $iScoreFrom
		AND t.iscore <= $iScoreTo
		AND t.followers >= $followersFrom
		AND t.followers <= $followersTo
		AND t.avg_video_views >= $videoViewsFrom
		AND t.avg_video_views <= $videoViewsTo
		AND t.avg_engagement >= $avgEngagementFrom
		AND t.avg_engagement <= $avgEngagementTo
		AND t.is_identification_enabled = true	
RETURN t AS talent, r AS belongsTo
ORDER BY t.iscore ASC
SKIP $offset LIMIT $limit;

One thing to note is that with the JOIN hint the number DB hits skyrocketed but seems that response time has been reduced, or at leased that is the impression we got from profiling.
Results: Cypher version: CYPHER 3.5, planner: COST, runtime: SLOTTED. 544294 total db hits in 5295 ms. Profile
Once deployed to our testing environment we could not note any improvement.

I find it very hard to explain how the latter query, using JOIN hint is more performant despite the huge number of DB hits, ~78% of our entire node population!

One thing that have been a hurdle to workaround is cached query results, we are not able to benchmark different approaches to query tuning due to that. We could not find a way to disable caching, not sure if this CYPHER replan=force is the way to go, but it is only supported in v4.x.

That said

  1. Do you think I should try something else that might have slipped unnoticed? What would that be?
  2. Please elaborate on the more performant yet less cost effective JOIN hint query, how is it possible and/or whether my understanding of profiling results are wrong?
  3. Please advise on how to disable caching results, for a better benchmarking experience.

Thanks!
Have a nice day.
k.

Can you also check your memory configuration, esp. for HEAP and Pagecache? And the pagecache-hits/misses on your query that should also show up in profile? I would probably give a bit more to the page-cache than the heap, so that at least your nodes, their properties and the relationship store and the indexes are mapped. you can use apoc.warmup.run(true,false,true) to load the page-cache.

Using JOIN should help you as it can do two index lookups, esp. if you force an index use of Talent(iscore) then it can even keep the sorting from the index (at least in Neo4j 4.x, not sure if that was already in 3.5)

Ah, just looked at your PROFILE it's already doing that.
So please check the page-cache settings.

What is your SKIP/LIMIT size?

Something else that I would check is if there are any properties that are large that you're returning to the client?

Does it show the same behavior when you run it outside of your cluster with the same memory settings on a single instance / localhost?

Also how selective is iscore? Is it correct that you get 58.5k results from the index? For iscoreFrom < iscore < iscoreTo ?

The other thing that's odd is that all your property filters only reduce that node-count from 58k to 56k ?
Are some of those properties dependent?

You could try to move the filter those additional properties after the sort just before the skip/limit.

2 Likes

Hello Michael,
Thanks for your reply.

These were our configs:

  - name: NEO4J_dbms_connector_bolt_thread__pool__min__size
    value: "50"
  - name: NEO4J_dbms_connector_bolt_thread__pool__max__size
    value: "1000"
  - name: NEO4J_dbms_connector_bolt_thread__pool__keep__alive
    value: "1m"
  - name: NEO4J_dbms_memory_heap_max__size
    value: "12G"
  - name: NEO4J_dbms_memory_heap_initial__size
    value: "4G"
  - name: NEO4J_causal__clustering_log__shipping__max__lag
    value: "512"

Our pagecache setting was set to 512M.

Using JOIN should help you as it can do two index lookups, esp. if you force an index use of Talent(iscore)

Will add another Index hint.

What is your SKIP/LIMIT size?

Anywhere from 20-50 records.

Something else that I would check is if there are any properties that are large that you're returning to the client?

Not really, all values are either numeric or short strings.

Does it show the same behavior when you run it outside of your cluster with the same memory settings on a single instance / localhost?

Yes!

Also how selective is iscore? Is it correct that you get 58.5k results from the index? For iscoreFrom < iscore < iscoreTo ?
The other thing that's odd is that all your property filters only reduce that node-count from 58k to 56k ?
Are some of those properties dependent?

In the queries above we only filter by iScore, hence the low diff in filtered nodes count. It is expected, since we are trying to address edge cases in our testing.

You could try to move the filter those additional properties after the sort just before the skip/limit.

Will give it a try.

We ran neo4j-admin memrec and we got back the following recommendation, which we applied and noticed some improvements:

#
# Assuming the system is dedicated to running Neo4j and has 26100m of memory,
# we recommend a heap size of around 9000m, and a page cache of around 10600m,
# and that about 6500m is left for the operating system, and the native memory
# needed by Lucene and Netty.
#
# Tip: If the indexing storage use is high, e.g. there are many indexes or most
# data indexed, then it might advantageous to leave more memory for the
# operating system.
#
# Tip: The more concurrent transactions your workload has and the more updates
# they do, the more heap memory you will need. However, don't allocate more
# than 31g of heap, since this will disable pointer compression, also known as
# "compressed oops", in the JVM and make less effective use of the heap.
#
# Tip: Setting the initial and the max heap size to the same value means the
# JVM will never need to change the heap size. Changing the heap size otherwise
# involves a full GC, which is desirable to avoid.
#
# Based on the above, the following memory settings are recommended:
dbms.memory.heap.initial_size=9000m
dbms.memory.heap.max_size=9000m
dbms.memory.pagecache.size=10600m

Will keep you posted.

Thanks!

How big is your store on disk? I guess 512M page-cache is way too small.

Would you be able to share the data with me for some internal testing?

Please also consider upgrading to 4.1 for quite a number of improvements in the Cypher runtime.

Its ~20Gb, I presume it is too small yes, increased to 11Gi.

Would you be able to share the data with me for some internal testing?

Will send you a link to a backup on Slack.

Please also consider upgrading to 4.1 for quite a number of improvements in the Cypher runtime.

We will, had a conversation earlier with Abed about the upgrade, we have no reason not to.

Cheers.

Following up on some of the earlier suggestions:

Using JOIN should help you as it can do two index lookups, esp. if you force an index use of Talent(iscore)

Tried to add an additional Index hint to Talent(iscore) but since we already have a JOIN hint on the same node, we got Multiple hints for same variable are not supported.

You could try to move the filter those additional properties after the sort just before the skip/limit.

Tried to do this too, but got syntax errors. Honestly i wasn't aware that we could use WHERE statements after the RETURN statement, is this what you meant or did I get it wrong?

I should also note that we have a read replica with 18 Gb of RAM but after retrieving configs it turned out to have the default heap and pagecache values of 512M.
We applied the same resource recommendation to the read replica, so that heap_initial_size and heap.max_size are set to 9Gi, pagecache.size to 11Gi, but hardly noticed any improvement!

I understand that how requests are rerouted depends on how we use the driver but what would be the optimal approach to that, route all read to read queries to read replicas while avoiding followers?

there are routing policies that are routing only to RR but I haven't used them myself, should be in the docs.

Yes you should check the config on all cluster members.
Did you run multiple queries or just one?

I tested it on 3.5 and 4.1 with 14G page-cache and had 50-200ms (worst case) results.

// 73ms
MATCH (s:AudienceSegment {checksum: $checksum})<-[r:BELONGS_TO]-(t:Talent:Instagram)
WHERE t.iscore IS NOT NULL
		AND t.iscore >= $iScoreFrom
		AND t.iscore <= $iScoreTo
		AND t.followers >= $followersFrom
		AND t.followers <= $followersTo
		AND t.avg_video_views >= $videoViewsFrom
		AND t.avg_video_views <= $videoViewsTo
		AND t.avg_engagement >= $avgEngagementFrom
		AND t.avg_engagement <= $avgEngagementTo
WITH s, r, t
WHERE r.affinity > 0 // <----
RETURN t AS talent, r AS belongsTo
ORDER BY r.affinity DESC
SKIP $offset LIMIT $limit;

// page-cache 14G
// heap 16G

:param avgEngagementFrom => 0
:param avgEngagementTo   => 1000000000000
:param checksum          => "ba92565be35fde27727bd375b9e67c35"
:param followersFrom     => 0
:param followersTo       => 1000000000000
:param iScoreFrom        => 2
:param iScoreTo          => 10
:param limit             => 15
:param offset            => 0
:param videoViewsFrom    => 0
:param videoViewsTo      => 1000000000000

MATCH (s:AudienceSegment {checksum: $checksum})<-[r:BELONGS_TO]-(t:Talent:Instagram)
USING JOIN ON t
WHERE $iScoreFrom <= t.iscore <= $iScoreTo
WITH t,r
ORDER BY t.iscore ASC
SKIP $offset LIMIT $limit
WITH t,r
WHERE $followersFrom <= t.followers <= $followersTo
AND $videoViewsFrom <= t.avg_video_views <= $videoViewsTo
AND $avgEngagementFrom <= t.avg_engagement <= $avgEngagementTo
RETURN t AS talent, r AS belongsTo
ORDER BY t.iscore ASC
SKIP $offset LIMIT $limit;
// 67ms

MATCH (s:AudienceSegment {checksum: $checksum})<-[r:BELONGS_TO]-(t:Talent:Instagram)
USING JOIN ON t
WHERE $iScoreFrom <= t.iscore <= $iScoreTo
AND $followersFrom <= t.followers <= $followersTo
AND $videoViewsFrom <= t.avg_video_views <= $videoViewsTo
AND $avgEngagementFrom <= t.avg_engagement <= $avgEngagementTo
RETURN t AS talent, r AS belongsTo
ORDER BY t.iscore ASC
SKIP $offset LIMIT $limit;
// 206ms
// 120ms in 4.1


cypher expressionEngine=compiled
MATCH (s:AudienceSegment {checksum: $checksum})<-[r:BELONGS_TO]-(t:Talent:Instagram)
WHERE $iScoreFrom <= t.iscore <= $iScoreTo
AND $followersFrom <= t.followers <= $followersTo
AND $videoViewsFrom <= t.avg_video_views <= $videoViewsTo
AND $avgEngagementFrom <= t.avg_engagement <= $avgEngagementTo
RETURN t AS talent, r AS belongsTo
ORDER BY t.iscore ASC
SKIP $offset LIMIT $limit;
// 55ms in 4.1

Hello Michael,

We've been trying to replicate your results with no success.

The first time we run a query, its response time vary between 20s - 75s, any consecutive queries take ~ 200ms.

System

Node: Google Cloud 2 CPU - 32Gi RAM ( 29Gi allocatable)
Docker container: 26Gi RAM
Neo4j: v4.1.1

Configs

Sysinfo

Since we shared with you the latest backup of our store it has now grown to ~715k nodes and ~185M relations.

Benchmarking with one of the above queries:

PROFILE MATCH (s:AudienceSegment {checksum: $checksum})<-[r:BELONGS_TO]-(t:Talent:Instagram)
USING JOIN ON t
WHERE $iScoreFrom <= t.iscore <= $iScoreTo
WITH t,r
ORDER BY t.iscore ASC
SKIP $offset LIMIT $limit
WITH t,r
WHERE $followersFrom <= t.followers <= $followersTo
AND $videoViewsFrom <= t.avg_video_views <= $videoViewsTo
AND $avgEngagementFrom <= t.avg_engagement <= $avgEngagementTo
RETURN t AS talent, r AS belongsTo
ORDER BY t.iscore ASC
SKIP $offset LIMIT $limit;

Results

Checksum: 52cd37fc3f4d3ab81f85f5d4cf03bfb3
Profile
Results - 52cd37fc3f4d3ab81f85f5d4cf03bfb3 - 1 - 1st