Cypher: Calculate distances

dlyberis · September 20, 2022, 12:35pm

I am trying to calulate distances between each node.

call n10s.inference.nodesLabelled('Entity', {catNameProp: "label", catLabel: "Resource", subCatRel: "SCO" }) YIELD node 
match(node)-[:hasLocation]-(b:Location) 
WITH node,point({latitude: avg(b.lat), longitude: avg(b.long)}) as entity_point
return node.name, entity_point

is it possible to calculate the distance between the returned nodes and find the closest neighboor to each one? Ιn continuation of the above query is there any for loop way to impement something like this?

Thank you in advance.

glilienfield · September 21, 2022, 4:43pm

Try this version. It should provide you a list of all the nodes, with each nodes corresponding three closest other nodes in a list. To make the code simpler to understand, I just went ahead and calculated the distance between each pair of nodes in both orders. I assume the distance calculation does not take that long.

The double unwind of points results in rows that are the Cartesian product of the points elements. With this, you can calculate the distance between every two points. The rows that contain the same two points are filtered out with line 8. I used list slicing to keep only the first three other nodes (which are the closest since the data is sorted in ascending order). This is what you correctly did too.

call n10s.inference.nodesLabelled('Entity', {catNameProp: "label", catLabel: "Resource", subCatRel: "SCO" }) YIELD node 
match(node)-[:hasLocation]-(b:Location) 
with node, point({latitude: avg(b.lat), longitude: avg(b.long)}) as entity_point
with collect({name: node.name, point: entity_point}) as points
unwind points as a
unwind points as b
with a, b
where a.name <> b.name
with a, b, distance(a.point, b.point) as distance
order by distance
with a.name as name, collect({name: b.name, distance: distance}) as bNodes
return name, collect(bNodes)[..3] as otherNodes

dlyberis · September 21, 2022, 2:16pm

with your code i can realize how cypher can handle data manipulation and aggregation, it is really helpfull. I prefer each name to have only the first 3 elements with the shortest distance.

i tried this one is it a proper way?

match(node)-[:hasLocation]-(b:Location) 
with node, point({latitude: avg(b.lat), longitude: avg(b.long)}) as entity_point
with collect({name: node.name, point: entity_point}) as points
unwind range(0,size(points)-2) as index
with index, points[index] as n, points[index+1..] as otherNodes
with index, n.name as name, [x in otherNodes |{name: x.name, distance: distance(n.point,x.point)}] as distances
unwind distances as distance
with index, name, distance
order by distance
with index,name,collect(distance) as distances
return index, name,distances[0..3]

dlyberis · September 21, 2022, 1:24pm

Thank you very much for your answer it worked after using the distance function on line 7 as you can see at the following cypher code.

call n10s.inference.nodesLabelled('Entity', {catNameProp: "label", catLabel: "Resource", subCatRel: "SCO" }) YIELD node 
match(node)-[:hasLocation]-(b:Location) 
with node, point({latitude: avg(b.lat), longitude: avg(b.long)}) as entity_point
with collect({name: node.name, point: entity_point}) as points
unwind range(0,size(points)-2) as index
with index, points[index] as n, points[index+1..] as otherNodes
return index, n.name, [x in otherNodes |{name: x.name, distance: distance(n.point,x.point)}] as distances

a part of the result is depicted at the following image

Is it possible to get an order by "distance" key of the returned distances lists for each n.name? is there a sort way to do it in the code that you provided ?
i really appreciate your help!

glilienfield · September 21, 2022, 1:12am

I don't have the library or data to test this. You can try to see if it results in what you are looking for.

call n10s.inference.nodesLabelled('Entity', {catNameProp: "label", catLabel: "Resource", subCatRel: "SCO" }) YIELD node 
match(node)-[:hasLocation]-(b:Location) 
with node, point({latitude: avg(b.lat), longitude: avg(b.long)}) as entity_point
with collect({name: node.name, point: entity_point}) as points
unwind range(0,size(points)-2) as index
with index, points[index] as n, points[index+1..] as otherNodes
return index, n.name, [x in otherNodes | {name: x.name, distance: n.point - x.point}] as distances

The distance between two points occurs with the expression 'n.point - x.point' on line 7. Replace this with the actual distance calculation between two entity points. The result of the query will be a row for each node, containing the name of the node and a collection of the other nodes and their distance from the row's node. The number of calculations per row decreases by one each row, as the algorithm does not calculate 'n.point - x.point' and 'x.point - n.point'. Let me know if there are issues and we can see if we can resolve them.

glilienfield · September 21, 2022, 4:46pm

Oops, there is an error in line 12 and I can't edit the previous post. The following is the corrected version.

call n10s.inference.nodesLabelled('Entity', {catNameProp: "label", catLabel: "Resource", subCatRel: "SCO" }) YIELD node 
match(node)-[:hasLocation]-(b:Location) 
with node, point({latitude: avg(b.lat), longitude: avg(b.long)}) as entity_point
with collect({name: node.name, point: entity_point}) as points
unwind points as a
unwind points as b
with a, b
where a.name <> b.name
with a, b, distance(a.point, b.point) as distance
order by distance
return a.name as name, collect({name: b.name, distance: distance})[..3] as otherNodes

glilienfield · September 21, 2022, 4:28pm

The way I wrote the code, each name has less and less comparison to other nodes. The last name is not even given an output. This is because I just calculated the distance between the current node and the remaining nodes in the list, because the distance calculation is commutative. As such, the the results as is are not necessarily the top three closest nodes for each node. The only one that has this property is the first node, as it contains all node calculations in its list. This can be fixed. The easiest way is just calculate the distance for each node agains all nodes, and filter the closest three for each. This ignores the efficiency of not calculating distance(a, b) and distance(b, a). To retain the efficiency of not calculating each distance twice, the above query can be modified. It is just more work and less understandable. If you want, I can alter it so you can get a valid list of the top three closest nodes for each node.

glilienfield · September 21, 2022, 2:03pm

try this:

call n10s.inference.nodesLabelled('Entity', {catNameProp: "label", catLabel: "Resource", subCatRel: "SCO" }) YIELD node 
match(node)-[:hasLocation]-(b:Location) 
with node, point({latitude: avg(b.lat), longitude: avg(b.long)}) as entity_point
with collect({name: node.name, point: entity_point}) as points
unwind range(0,size(points)-2) as index
with index, points[index] as n, points[index+1..] as otherNodes
with index, n.name as name, [x in otherNodes |{name: x.name, distance: distance(n.point,x.point)}] as distances
unwind distances as distance
with index, name, distance
order by distance desc
return index, name, collect(distance) as distances

Do you want each name to have the full list of other names and their corresponding distances?

Topic		Replies	Views
Distance calculation Cypher	3	2718	January 21, 2020
Distance calculation between neighbouring points Newbie Questions	8	748	February 25, 2020
Find distance between the nodes and Show the shortest route Cypher cypher	25	556	April 16, 2022
Try to structure and automate a query Cypher	2	632	December 6, 2019
Optimize query for calculate distance and create relationship Neo4j Graph Platform cypher	1	172	April 16, 2024

July Summer Fun!

Cypher: Calculate distances

Related topics