I'm trying to find the closest point to another point in Neo4j V5 enterprise. The obvious answer is to use the point.distance function, however this doesn't perform how I would like it to.
I'll explain: Using the point.distance function to run a query something like this:
PROFILE WITH point({longitude:-5.9, latitude:40.7}) AS poi
MATCH (n:LocationNodes)
WITH n, point.distance(poi, n.point) as distance
ORDER BY distance
LIMIT 1
RETURN n
However if I look at the execution plan for this query, it calculates the distance from my poi to ALL LocationNodes in my database. If I have 10 million poi nodes then this is a lot of DB hits and isn't performant.
I'm really looking for the index to be used and the index gets the next closest point. I was hoping the index would store points by distance or something like that.
My test example is as follows:
CREATE POINT INDEX LocationNodes_point FOR (node:LocationNodes) ON (node.point);
CREATE (:LocationNodes {point:point({longitude:- 6.1, latitude:40.7})})
CREATE (:LocationNodes {point:point({longitude:- 6.2, latitude:40.8})})
CREATE (:LocationNodes {point:point({longitude:- 6.4, latitude:40.6})})
CREATE (:LocationNodes {point:point({longitude:- 5.9, latitude:40.7})})
CREATE (:LocationNodes {point:point({longitude:- 5.8, latitude:40.8})})
CREATE (:LocationNodes {point:point({longitude:- 5.5, latitude:40.6})})
CREATE (:LocationNodes {point:point({longitude:- 4.1, latitude:40.7})})
CREATE (:LocationNodes {point:point({longitude:- 4.4, latitude:40.8})})
CREATE (:LocationNodes {point:point({longitude:- 4.7, latitude:40.6})})
My above distance query searches all 9 nodes calculating the distance for each. Total DB hits is 30. This is for only 9 point nodes. If I have millions then this is going to be a performance problem.
In contrast if I run following query
PROFILE WITH point({longitude:-6.10, latitude:40.81}) AS poi
MATCH (n:LocationNodes) WHERE point.distance(poi, n.point) <10000
RETURN n
This hits the index and results in a DB hit of only 4. If I increase the number of nodes in the system my understanding is this will remain at 4 DB hits as long as there is only 1 point in the 10000 meter radius. This is exactly what I would expect. This is a defined number of DB hits for any number of nodes. And this is what I want when calculating the closest node.
Here the index is used to calculate the 1 node within the 10000meter distance. The index obviously knows distance between nodes. How can I use the index to get the closest point? Or is these another way to find the closest point node?