How to reduce the db hit during the CYPHER execution

Hi ,
I started exploring the possibility of neo4j in our domain and as of now we are using community version .I am from the RDBMS background and this below question comes up everytime when i run the CYPHER .Your help will be greatly appreciated .Below is my simple SQL

neo4j> profile MATCH (origin:CS_LOCATION), (destination:CS_LOCATION)WHERE origin.location_id in[635] and destination.location_id in[1104642] return origin, destination;

+-------------------+----------------+------+---------+-----------+---------------------+---------------------------+
| Operator | Estimated Rows | Rows | DB Hits | Cache H/M | Identifiers | Other |
+-------------------+----------------+------+---------+-----------+---------------------+---------------------------+
| +ProduceResults | 625 | 1 | 0 | 0/0 | destination, origin | |
| | +----------------+------+---------+-----------+---------------------+---------------------------+
| +CartesianProduct | 625 | 1 | 0 | 0/0 | destination, origin | |
| |\ +----------------+------+---------+-----------+---------------------+---------------------------+
| | +NodeIndexSeek | 25 | 1 | 2 | 0/0 | destination | :CS_LOCATION(location_id) |
| | +----------------+------+---------+-----------+---------------------+---------------------------+
| +NodeIndexSeek | 25 | 1 | 2 | 0/0 | origin | :CS_LOCATION(location_id) |
+-------------------+----------------+------+---------+-----------+---------------------+---------------------------+

Here i am referencing CS_LOCATION twice and i have an index on location_id .The first read should be from the storage and a db hit .Is there anyway i can change the CYPHER to make the 2nd reference of CS_LOCATION from the cache since it is already selected from the first MATCH ?

Thanks in advance
Sojan

Because the destination id is not in the first filtering clause, it makes sense that the id isn't going to be in the cache and why you're seeing a second round of DB hits. You could try some funky manipulations but before trying find ways, I'm curious in a higher level what are you trying to accomplish? You're query is generating a cartesian product, is that your intention and why you don't have any relationships specified between the nodes? Or are you trying to detach if a path exists? I would have expected at least a query like this to test if the nodes are connected.

PROFILE 
MATCH (origin:CS_LOCATION {id: 635})--(destination:CS_LOCATION {id: 1104642})
RETURN origin, destination
1 Like

Thanks for the reply .There is no connection between these 2 nodes and these are under a CS_LOCATION label .It is similar to doing a flight ticket searching system where we specify From and To .This part is the beginning of a Bigger SQL .I was focusing on just the logic of the 2 db hits and see somehow we can avoid that .From the first sentence from your reply it kind of make sense now .I was thinking ,for origin it pulls all the CS_LOCATION label (2 million nodes) into page cache and filter for the origin .Since the 2 million nodes are already in the cache why are we hitting the db again for destination .That was my thinking .If the first db hit picks up only the origin related data into page cache then your first sentence make really sense .

From the plan, it's doing an index lookup, so it's not a filtering operation for all :CS_LOCATION nodes (if so you would see a NodeByLabelScan followed by a Filter), it's far quicker. Two separate index seeks like this, for 2 db hits, is trivial, this isn't something that requires optimization.

Also, db hits are units of db work, but that doesn't differentiate between a hit on the cache and a cache miss hitting disk. Both will show the same db hits.

2 Likes

I think the cache will be used inherently. You do not need to explicitly modify the cypher query.

PS: Why not use
match (loc:CS_LOCATION) where loc.id in [id1, id2]

1 Like