Nearest neighbor to a node from a set of nodes of specific type


(Violina Zh) #1

I currently imported a road map graph in Neo4j and I would like to execute some queries on it. There are crossroads, mapped as vertices and roads, mapped as relationships. Also, there are points of interest, which are mapped to a crossroad, using a relationship. Here is an example of the graph (not my picture but my graph looks similar): graph and my sample data: data

What I would like to be able to achieve is to find for a single crossroad the nearest neighbors from every POI category type (e.g. restaurant, coffee shop...). I tried using the shortest path algorithm (see below) but it runs for a very long time and I need to do this for every crossroad in the map (I have around 500 000):

MATCH (a:CROSSROAD)-[k:HAS_POI]->(o:POI {id: 10, type: "coffee_shop"})
MATCH (c:CROSSROAD)-[h:HAS_POI]->(p:POI {type: "restaurant"})
CALL algo.shortestPath(a, c, 'distance')
YIELD totalCost
RETURN p.id as POI, totalCost
ORDER BY totalCost
limit 1

Is there a better and faster way to do this?

I wonder if there is maybe a way to traverse from the node by taking the smallest distance on the ROAD relationship until a POI from the specific type is reached. Although I am not sure how to express this in Cypher language.

PS: I am quite new to Neo4j and Cypher and I am still learning, so excuse me if this is an obvious question.


(Paul Thomas) #2

I don't have access to the sample data but ...

With your top two matches, you end up trying to compute the shortest path between all possible crossroad pairs's. With 500,000 crossroads you are going to have probably billions of combos ... not good!

Instead collect a list of all possible poi types and loop over this list ...

match (p:POI)
with collect(distinct p.type) as poitypes
unwind poitypes as poitype
...

then need to fill in the code then to match each crossroad against the nearest poi of each type. Once you find the nearest poi, maybe add a new link but remember there could be multiple, equally closest coffee-shop from a corner e.g.


(Violina Zh) #3

Thank you very much for the quick answer!

then need to fill in the code then to match each crossroad against the nearest poi of each type. Once you find the nearest poi, maybe add a new link but remember there could be multiple, equally closest coffee-shop from a corner e.g.

I appreciate your input, but this is exactly with what I am currently struggling and what I don't understand how to do. How to find for a single crossroad the nearest POI from a type "restaurant" for example without traversing all restaurants?

I would eventually want to create a table (in the form of 2d array in Java) to store for every CROSSROAD point the nearest POI from every type. I don't mind the code running for 5 hours, because I will execute it only once, but my current solution is going to take days as I foresee.

Edit: I fixed the data link.


(Paul Thomas) #4

Something like this might help get you started ...

CREATE (a:CROSSROAD {name:'A'})
CREATE (b:CROSSROAD {name:'B'})
CREATE (c:CROSSROAD {name:'C'})
CREATE (d:CROSSROAD {name:'D'})
CREATE (e:CROSSROAD {name:'E'})
CREATE (f:CROSSROAD {name:'F'})
CREATE (g:CROSSROAD {name:'G'})

CREATE (a)-[:ROAD {cost:50}]->(b)
CREATE (b)-[:ROAD {cost:60}]->(c)
CREATE (c)-[:ROAD {cost:70}]->(d)
CREATE (b)-[:ROAD {cost:20}]->(d)
CREATE (c)-[:ROAD {cost:35}]->(e)
CREATE (e)-[:ROAD {cost:45}]->(f)
CREATE (f)-[:ROAD {cost:15}]->(g)

CREATE (s1:POI {name:'s1', poitype:'Shop'})
CREATE (s2:POI {name:'s2', poitype:'Shop'})
CREATE (c1:POI {name:'c1', poitype:'Church'})
CREATE (c2:POI {name:'c2', poitype:'Church'})
CREATE (h1:POI {name:'h1', poitype:'Hotel'})
CREATE (h2:POI {name:'h2', poitype:'Hotel'})
CREATE (zoo:POI {name:'zoo', poitype:'Zoo'})

CREATE (a)-[:HAS_POI {cost:0}]->(s1)
CREATE (b)-[:HAS_POI {cost:0}]->(s2)
CREATE (c)-[:HAS_POI {cost:0}]->(c1)
CREATE (d)-[:HAS_POI {cost:0}]->(c2)
CREATE (e)-[:HAS_POI {cost:0}]->(h1)
CREATE (a)-[:HAS_POI {cost:0}]->(h2)
CREATE (g)-[:HAS_POI {cost:0}]->(zoo)

Next query can now be used to find the distance to the closest POI of each type to a specified crossroad

match poipath = (c1:CROSSROAD)-[r:ROAD*0..3]->(c2:CROSSROAD)-->(p:POI)
where c1.name = 'A'
with c1, p, reduce(total=0, h in relationships(poipath) | total + h.cost) as totalCost
with c1, p.poitype as poitype, min(totalCost) as minCost
return poitype, minCost

poitype minCost
"Shop" 0
"Hotel" 0
"Church" 70

if you increase the numbers of hops, then the distance to the Zoo is also returned ...

match poipath = (c1:CROSSROAD)-[r:ROAD*0..10]->(c2:CROSSROAD)-->(p:POI)
where c1.name = 'A'
with c1, p, reduce(total=0, h in relationships(poipath) | total + h.cost) as totalCost
with c1, p.poitype as poitype, min(totalCost) as minCost
return poitype, minCost

poitype minCost
"Shop" 0
"Hotel" 0
"Church" 70
"Zoo" 205

(Violina Zh) #5

Thank you! This was very useful to me, I didn't know of this type of queries.

The problem I have now is that I want to find the nearest neighbors of all types in one go without having to know the number of hops. Is that even possible with Cypher?
I researched a lot the past few days and I considered maybe writing my own version of Dijsktra, where while expanding from a specific vertex the algorithm checks if a POI of one of the category types has been visited and puts it in the table I mentioned. Although I don't yet know how to do this without specifying a destination point. I would love to know your opinion on this :)


(Paul Thomas) #6

Thank you! This was very useful to me, I didn't know of this type of queries.

If you were not aware of ()-[r:RELTYPE*0..3]-() type queries, I think it would be a good idea to learn more about the capabilities of cypher before trying to solve your problem. This is a great video to learn from to go beyond the basics.

The problem I have now is that I want to find the nearest neighbors of all types in one go without having to know the number of hops. Is that even possible with Cypher?

Will a coffee shop twenty crossroads away be relevant or likely in real-life?

I think deciding to limit the search to something like 10 hops is a very sensible thing to do.

Trying to search across an unlimited numbers hops means you might write some complicated eg java logic or else maybe hit performance issues.