I haven't done anything with geospatial before but... with my experience and book knowledge, I can ask sanity questions to help diagnose the problem...
In the very first PROFILE you show in this thread, the distance query resulted in 6M DB hits. How many total GraphUser nodes do you have in your system? If it's 6M, then the original query probably had to do a linear search (painfully slow!) over the entire GraphUser nodes.
In theory, a geospatial index should help reduce the search space as the query should be able to quickly exclude points outside of the bounding box using the index. (It would do a gathering of nodes that fell within the bounding box of lat, long ranges corresponding to a distance of 100000 ). I'm not sure if you had the geospatial index on location with the first attempt or not.
Are you using PROFILE
or EXPLAIN
? I think PROFILE
might be better as it tells you actually what occurred.
One thing to try, is do the query with and without the index on the location. If it doesn't improve, then it's geospatial indexing is not behaving the way that I expect and you'll have to ask someone else... (You could duplicate the DB and then drop the geospatial index in the duplicated DB and rerun the query.)
I am somewhat puzzled by the PROFILE you did based on my subquery suggestion. The subquery 91,872 DB hits but the following Geospatial query has almost exactly twice as many hits: 183k. This suggests to me that geospatial search on the results of the subquery has to do a linear search twice (once for lat and once for long.). I guess this makes sense, but it's why the query is still slow.
The other thing, are there any other properties that you can narrow down the search on before passing it onto the geospatial search?
Lastly, what if you make the distance to something smaller? (e.g. 1000 instead of 100000?) If the indexing is working, then I believe the PROFILE
might show far fewer DB hits. If it doesn't, then something isn't working the way that I expect.
I need to think about this some more...