Optimising properties access

tard_gabriel · April 15, 2021, 5:46pm

Hello there

I was reading the optimising properties access chapter in the query tuning course and I wanted to double check something with you:

If the first elapsed time is the time for the query to be executed and the second time is the one for the query results to be all streaming to the end client thought a network or not, does it mean that:

If there are eager aggregation or operation in the query the time between both the first and the second elapsed time will be shorter as the query will have process every single row anyway almost up to the end of the query ( depending of the query ) before being able to stream the first result?

If there is no eager operation or aggregation at all or implicit infinite loop protection, the query results can be produce for a single row immediately explaining why some queries are able to have a 1 ms query time but usually a much higher time to stream the results.

Thanks

andrew_bowman · April 15, 2021, 8:00pm

Hi Gabriel,

That's my general understanding.

As you said, it does depend upon the query. Now that we have subqueries since Neo4j 4.1, we have some additional flexibility. Subqueries can be used to scope aggregations (they get called per-row), so aggregations within a subquery do not require processing all input rows outside of the subquery.

As a quick example:

MATCH (m:Movie)
CALL {
 WITH m
 MATCH (m)<-[:ACTED_IN]-(actor:Person)
 RETURN collect(actor) as actors
}
RETURN m, actors

Because the subquery executes per row (in this case, per m), the first execution of a collect() happens for the expansion on the first movie node, and doesn't require any processing on any other movie node in order to complete.

The data being aggregated is also much less, and will complete quicker. In this case it will be the actors for an individual movie, not all actors for all movies. We're trading off a single large aggregation across the entire input set, for many aggregations that each execute on much smaller input sets.

I do want to quickly say that with Cypher alone you can't get caught in an infinite loop, as the relationship isomorphism used by Cypher prevents a relationship from being used more than once per path.

tard_gabriel · April 16, 2021, 2:10am

Thank you @andrew_bowman

Great and precise answer, your subquery example should be add to the reducing cardinality chapter. I understood something great about subquery just with this example.

As I understood it, the subquery version will allow to start streaming data sooner than the regular version who will wait for all the actors of all the movies to be aggregated before streaming a single row to the client.

But I don't know if it's actually the case when I look at the query plan.

Is there anyway to monitor the behaviour you just explained?

andrew_bowman · April 19, 2021, 4:39pm

I'm not aware of a way to monitor it.

For plan analysis, the key is recognition of the Apply operator, which indicates that the righthand steps in the plan are executed per incoming row from the lefthand side.

From this, we can tell that the EagerAggregation is in the righthand of the Apply, so that verifies the scoping of the aggregation. As soon as the first row from the label scan finishes its run through the Apply, the results from it are ready (and then it's up to the network and driver code to decide when/how to stream the results).

As there are no further aggregations happening outside the apply, we know that we don't have to wait for ALL actors involved in the query to aggregate, so we end up streaming the data sooner.

tard_gabriel · April 19, 2021, 8:23pm

I just finish the Query Tuning course, I guess if I want to become a real query tuning master my best option is to read more about each operator in the manual.

Thank you again have a nice day

Topic		Replies	Views
Optimizing a query with a subgraph/subquery, only look at specific nodes Neo4j Graph Platform performance , cypher	4	384	June 10, 2021
Questions about my query model Cypher querying , optimization , cypher , subquery	13	97	March 6, 2025
How to optimize queries on array properties? Neo4j Graph Platform	14	788	November 25, 2020
How to execute query fully and discarding results for performance comparison? Cypher	2	246	August 4, 2021
The Power of Subqueries in Neo4j 4.x Neo4j Developer Blog Archive	0	1052	July 16, 2020

Optimising properties access

Related topics