Partial/Whole computation of results with Limit operator

hm873154 · March 29, 2023, 3:38pm

Hi,

I often execute my Cypher queries with the Limit clause since I work with large data graphs. The Limit clause improves significantly the matching time of my queries, but I am wondering if the whole result set is computed first and then only some rows (indicated by Limit) are returned, or the Neo4j computes only necessary matches that fulfill the value of Limit. I tried to find a response by using "Profile" but the result are the same excepting the last phase of projection.

Is there some kind of query where the whole result set is computed even by using the Limit clause ?

Thank you for your help

elaine_rosenber · March 29, 2023, 4:02pm

Hello @hm873154 ,

How much work done in the query depends on where you place the LIMIT clause.

If you place it early in the query, then it can limit the number of rows computed.

If you place it in the RETURN clause, it only limits the number of rows returned.

hm873154 · March 29, 2023, 6:05pm

Thank you for your help.

Is there some cases where the limit clause reduces number of examined result rows ? Maybe by using indexes?

For instance, given the following query:

Match (p: Person) return p order by p.age limit 10

By using index on property age, can the query be executed only over the ten first rows rather than all Persons?

Otherwise, how can I place optimally the limit clause ?

For instance, my query is as follows:

Match (p1:Person)-[: knows]-(p2:Person)-[: knows]-(p3:Person) with p1,p2,p3, (p1.experience+p2.experience+p3.experience)/3 as Exp return p1,p2,p3 order by Exp desc limit 20

Fir this query, even I use limit or not the execution time is almost the same !!! Is there another manner to use optimally the limit clause as you suggested ?

dana_canzano · March 29, 2023, 8:46pm

@hm873154

this is easily demonstatrable and with simple cypher queries, and in this case using Neo4j 5.5.0, though version should not be centric to the issue

 ./cypher-shell
Connected to Neo4j using Bolt protocol version 5.0 at neo4j://localhost:7687.
Type :help for a list of available commands or :exit to exit the shell.
Note that Cypher queries must end with a semicolon.
@neo4j> create database orderBy;
0 rows
ready to start consuming query after 1929 ms, results consumed after another 0 ms

@neo4j> :use orderBy;
@orderby> unwind range(1,100) as x create (n:Person {age:x});
0 rows
ready to start consuming query after 149 ms, results consumed after another 0 ms
Added 100 nodes, Set 100 properties, Added 100 labels
@orderby> profile match (n:Person) return n.age order by n.age limit 10;
+-------+
| n.age |
+-------+
| 1     |
| 2     |
| 3     |
| 4     |
| 5     |
| 6     |
| 7     |
| 8     |
| 9     |
| 10    |
+-------+

+---------------------------------------------------------------------------------------------------+
| Plan      | Statement   | Version | Planner | Runtime     | Time | DbHits | Rows | Memory (Bytes) |
+---------------------------------------------------------------------------------------------------+
| "PROFILE" | "READ_ONLY" | ""      | "COST"  | "PIPELINED" | 213  | 301    | 10   | 3920           |
+---------------------------------------------------------------------------------------------------+


Planner COST

Runtime PIPELINED

Runtime version 5.5

Batch size 128

+------------------+----+----------------------+----------------+------+---------+----------------+------------------------+-----------+-------------+---------------------+
| Operator         | Id | Details              | Estimated Rows | Rows | DB Hits | Memory (Bytes) | Page Cache Hits/Misses | Time (ms) | Ordered by  | Pipeline            |
+------------------+----+----------------------+----------------+------+---------+----------------+------------------------+-----------+-------------+---------------------+
| +ProduceResults  |  0 | `n.age`              |             10 |   10 |       0 |                |                    0/0 |     0.197 |             |                     |
| |                +----+----------------------+----------------+------+---------+----------------+------------------------+-----------+             |                     |
| +Top             |  1 | `n.age` ASC LIMIT 10 |             10 |   10 |       0 |           3856 |                    0/0 |     9.301 | `n.age` ASC | In Pipeline 1       |
| |                +----+----------------------+----------------+------+---------+----------------+------------------------+-----------+-------------+---------------------+
| +Projection      |  2 | n.age AS `n.age`     |            100 |  100 |     200 |                |                        |           |             |                     |
| |                +----+----------------------+----------------+------+---------+----------------+                        |           +-------------+                     |
| +NodeByLabelScan |  3 | n:Person             |            100 |  100 |     101 |            120 |                    3/0 |     2.495 |             | Fused in Pipeline 0 |
+------------------+----+----------------------+----------------+------+---------+----------------+------------------------+-----------+-------------+---------------------+

Total database accesses: 301, total allocated memory: 3920

here above we create 100 :Person nodes and with each :Person having a unique age, i.e. age 1 through 100.
And running the query Match (p: Person) return p order by p.age limit 10 returns :Persons with ages 1 through 10. and as expected.
And from the PROFILE and reading from bottom up we see

| +NodeByLabelScan |  3 | n:Person             |            100 |  100

and the 100 indicating we are reading 100 nodes.
also take note of Total database accesses: 301, total allocated memory: 3920 for this will be compared further below.

Now lets add index on age property and see how this changes

@orderby> create index age for (n:Person) on n.age;
0 rows
ready to start consuming query after 61 ms, results consumed after another 0 ms
Added 1 indexes

and rerunning the same Match (p: Person) return p order by p.age limit 10

returns the same 10 rows as above and the profile is the same as above and the index is not utilized since there is no where clause restriction on said property indexed. However if we change to

Match (p: Person) where p.age>0 return p order by p.age limit 10

then we see

| +NodeByLabelScan |  3 | n:Person             |            100 |  100

is replaced by

| +NodeIndexSeekByRange |  2 | RANGE INDEX p:Person(age) WHERE age > $autoint_0 |              3 |   11 |      12 |            120 |                    2/1 |     5.656 | p.age ASC  | Fused in Pipeline 0 |
+-----------------------+----+--------------------------------------------------+----------------+------+---------+----------------+------------------------+-----------+------------+---------------------+

Total database accesses: 32, total allocated memory: 184

10 rows

and here we see Total database accesses: 32, total allocated memory: 184 as compared to the prior of Total database accesses: 301, total allocated memory: 3920

so in short and as above, these are easy methods to validate on your own

elaine_rosenber · March 30, 2023, 2:54pm

Hello @hm873154 ,

You might want to take a look at the pipelining lesson in the Intermediate Query course where we teach about moving limit up in the query:

https://graphacademy.neo4j.com/courses/cypher-intermediate-queries/5-pipelining-queries/06-pipelining/

Topic		Replies	Views
A very newbie question: how does "LIMIT" in Cypher limit results? Neo4j Graph Platform migrated	2	294	June 27, 2022
How does the LIMIT clause work? Cypher cypher	2	629	May 23, 2020
Limit Data Processed By Subqueries Cypher cypher , limit , subquery	0	117	February 29, 2024
Limiting MATCH results per row Cypher cypher , knowledge-base , limit	0	1734	August 23, 2018
LIMIT 6 returns 7? Graph Academy	3	438	March 1, 2021

Partial/Whole computation of results with Limit operator

Related topics