cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! Site migration is underway. Phase 2: migrate recent content

Performance of Spatial Intersections

brian1
Node Link

Hello all
I'm evaluating the performance of Neo4J with doing a geospatial intersection between points. The intent is as follows: Take 300,000 Address points that have a street address, latitude, and longitude and see which addresses are within 50m of each other. The below query works fine on very small datasets, however, it is extremely unwieldly when dealing with a dataset of 300,000 points. Due to the cartesian products, this tends to take quite a long time.

Is there something I can do to optimize this further? I can achieve much faster results using GIS tools such as QGIS, however, I'd prefer to keep a work pipeline within Neo4j if the functionality can support it.

CREATE index for (n:Address) on (n.address) ;
CREATE index for (n:Address) on (n.location) ;

MATCH (p:Address)
WHERE p.location is NULL
AND p.latitude is not NULL
SET p.location = Point({latitude:tofloat(p.latitude), longitude:tofloat(p.longitude)})
RETURN count(*);


CALL apoc.periodic.iterate( 
'
MATCH (p1:Address) 
WHERE exists(p1.location)
WITH p1
MATCH (p2:Address) 
WHERE exists(p2.location) 
AND distance(p1.location, p2.location) < 50
AND p1.id <> p2.id
RETURN p1.address, p2.address
',
'
merge(p1)-[r:geo_intersect]-(p2)
',
{batchSize:10000, iterateList:true, parallel:false}  
);

1 REPLY 1

brian1
Node Link

OK, so modifying:

RETURN p1.address, p2.address

to

RETURN p1,p2

greatly increases performance. I had inadvertently left in a debugging return statement in the production query statement.

Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit

All the sessions of the conference are now available online