Can we improve this further: Optimize Expand Operation

phanindhar.bodla · August 30, 2018, 3:24pm

WITH [8521,8522,8519] as place_ids
MATCH (place:Location)-[:VISITED_BY]->(user:User)
where place.id in place_ids
WITH collect(distinct user) as users
return users

This query runs for 1.2 s for 3 place_ids places ids can grow up-to 20 with different combinations.
For just a single place its around 350ms and 70 % users are overlapped for places. I understand that we has visit a user multiple times for (place*users) no of relations. But is there way to hint the cypher to mark the visited node or parallelize or some other way to get the time down? It is too scary to imagine runtimes for 20 places. So please suggest anything that may cutdown the runtimes.

Thanks In Advance

david_allen · August 30, 2018, 3:37pm

Please put EXPLAIN in front of your query and post the query plan. The most immediate thing that jumps to mind is whether or not you have an index on :Location(id) and whether the query planner is using it.

The next thing that jumps to mind is the degree of those nodes, how many users have visited them. It could take some time to get a unique list if the total list of users who have visited a place is very large.

phanindhar.bodla · August 30, 2018, 4:16pm

Hi @david_allen, I've indexed both the nodes. Please have a look at the plan.

  neo4j> explain
   WITH [8521,8522,8519] as child_ids
   MATCH (lo:LO)<-[:SYLLABUS_LO{type:'in syllabus'}]-(child_syllabus:Syllabus)
   where child_syllabus.id in child_ids
       WITH collect(distinct lo) as los
       return los
   ;
+---------------------------------------------------------------------+
| Plan      | Statement   | Version      | Planner | Runtime   | Time |
+---------------------------------------------------------------------+
| "EXPLAIN" | "READ_ONLY" | "CYPHER 3.4" | "COST"  | "SLOTTED" | 5    |
+---------------------------------------------------------------------+

+------------------------+----------------+-----------------------------------------+-----------------------------------------------+
| Operator               | Estimated Rows | Identifiers                             | Other                                         |
+------------------------+----------------+-----------------------------------------+-----------------------------------------------+
| +ProduceResults        |            154 | los                                     | 3.4; 3.4                                      |
| |                      +----------------+-----------------------------------------+-----------------------------------------------+
| +EagerAggregation      |            154 | los                                     |                                               |
| |                      +----------------+-----------------------------------------+-----------------------------------------------+
| +Filter                |          23667 | anon[49], child_ids, child_syllabus, lo | lo:LO                                         |
| |                      +----------------+-----------------------------------------+-----------------------------------------------+
| +Expand(All)           |         236671 | anon[49], child_ids, child_syllabus, lo | (child_syllabus)-[anon[49]:SYLLABUS_LO]->(lo) |
| |                      +----------------+-----------------------------------------+-----------------------------------------------+
| +Apply                 |          23667 | child_ids, child_syllabus               |                                               |
| |\                     +----------------+-----------------------------------------+-----------------------------------------------+
| | +NodeUniqueIndexSeek |              9 | child_ids, child_syllabus               | :Syllabus(id)                                 |
| |                      +----------------+-----------------------------------------+-----------------------------------------------+
| +Projection            |              1 | child_ids                               | {child_ids : $`  AUTOLIST0`}                  |
+------------------------+----------------+-----------------------------------------+-----------------------------------------------+

Also the I've used alias to provide a better context for community to understand. This is the label mapping Location = Syllabus , User = LO, in the above query. BTW I've used multi line union queries and they were not even close to this in runtime.

Many Thanks,

andrew_bowman · August 30, 2018, 4:38pm

Next step is to PROFILE this and provide us with the profile plan so we can see the rows and db hits.

If you can take the visual profile plan from running through the browser, please make sure to expand all elements before saving the image.

phanindhar.bodla · August 30, 2018, 4:52pm

PFA

andrew_bowman · August 30, 2018, 5:04pm

This looks like an opportunity to change up your model a bit to allow more streamlined expansions.

We can see that your expansion hits more than 77k nodes, and only after filtering (both on the relationship 'type' property and on the :LO label of the connecting nodes) has a high number of db hits and narrows down results to 17k rows.

Node label filtering is relatively cheap, so my guess is that most of the effort is spent on relationship property access to filter on the type property. If you can refactor your graph so you have distinct relationship types rather than using a 'type' property, it's likely we can make this much cheaper, both by avoiding property access, and by avoiding expansion of relationships whose types we don't care about.

So if you use :IN_SYLLABUS relationships instead of :SYLLABUS_LO with a type (and likewise use distinct relationship types in place of the other 'type' values) you're likely to save quite a bit whenever you're doing expansions that should only be on specific types.

phanindhar.bodla · August 30, 2018, 5:12pm

Hi @andrew_bowman,

This approach surely does look interesting . Will try out this, and will update the thread.
Thanks for your time. Also it will be great if you can suggest some good books/articles/references on modeling graph for better runtimes.

andrew_bowman · August 30, 2018, 5:17pm

You'll probably want to go through Max De Marzi's blog, he's great with optimizing models.

In general though it's a good idea to try to minimize property access, including filtering by property (this is not the same as lookup by property, which can leverage index lookup) unless there's no way around it.

Using distinct relationship types (as in this case) is probably the most useful kind of optimization for traversals (Max uses this quite a bit), since that means we don't even have to expand the types we're not interested in, so we don't even have to perform filtering.

phanindhar.bodla · August 30, 2018, 7:52pm

@andrew_bowman Thanks for the great tip. Doing what you suggested helped me to achieve exactly the thing that am looking for, After adding the new label for these relations I've tried to understand the reason, for which, I've profiled and observed that a node is visited exactly once. This is a huge improvement.

Thanks,

andrew_bowman · August 30, 2018, 7:54pm

Excellent, glad to help!

Topic		Replies	Views
Querying relationships slow performance Cypher performance , cypher , relationship	4	2203	October 15, 2020
Performance of a parallel query execution Neo4j Graph Platform performance	1	2455	April 26, 2019
Neo4j Cypher query execution optimization Neo4j Graph Platform migrated	2	122	July 19, 2022
Performance problems with apoc.path.expand and apoc.run.cypher Procedures & APOC apoc , performance , cypher	3	352	May 13, 2021
How to optimize execution time for this simple request? Neo4j Graph Platform apoc , cypher	4	92	March 25, 2025

Take the Course Then Join The Aura Agent Hackathon

Can we improve this further: Optimize Expand Operation

Related topics

Take the Course Then Join
The Aura Agent Hackathon