cancel
Showing results for 
Search instead for 
Did you mean: 

Join the community at Nodes 2022, our free virtual event on November 16 - 17.

Optimizing a query and understanding the profiler

eric7
Node

Hello,

I am trying to optimize a query I have been working on but do not understand why cypher/neo4j profiler hits the database as much as it does.

The query below tries to find all mutual contacts for a given user $user_id.

1st pass

profile MATCH (u1:User {user_id: $user_id})-[:CONTACT]->(u2:User)
where exists ((u2)-[:CONTACT]->(u1))
return u1,u2

2nd pass (better but still not great)

profile MATCH (u1:User {user_id: $user_id})-[:CONTACT]->(u2:User)
with u1,u2
match ((u2)-[:CONTACT]->(u1))
return u1,u2

My understanding is that using WITH I am signaling to the 2nd MATCH clause the existence of the start and end nodes. However, the profiler seems to tell me that this incurs the most amount of db hits: Screen-Shot-2021-06-03-at-2-58-21-PM — ImgBB . I'm confused about best practices for things like this and how to optimize my query. Thank you!

1 REPLY 1

Either approach should work. Generally I would favor the first query.

Given that you already have a unique constraint on :User(user_id), there's not much more tuning you can do here.

As for the number of db hits, perhaps the output of this query will show you how many relationships the query must consider before it arrives at the 28 resulting rows:

match (u1:User {user_id: $user_id})-[:CONTACT]->(u2:User)
with u1,u2
match (u2)-[:CONTACT]->()
return count(*) as relationshipsRequiringFiltering
Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit - November 16 - 17, 2022.


Free NODES Training Series


October 19th -

Intro to Neo4j


October 20th -

Healthcare Analytics Using Neo4j


October 25th -

Handling Neo4j data with Apache Hop


October 26th -

Blazing Fast Graphs: Hands-on with Apache Arrow and Neo4j


November 2nd -

Graph EDA Using the Neo4j GDS Client