Head's Up! Site migration is underway. Phase 1: replicate users.
β05-16-2019 11:36 PM
I've been trying to follow the GraphConnect 2018 video on loading OSM data into a routable graph (https://neo4j.com/graphconnect-2018/session/neo4j-spatial-mapping) - all goes well until I try the cypher shown at 21:26. If I run the cypher exactly as shown (including 'LIMIT 100' on the match), wonβt that only setup a [:ROUTE] relationship for 100 intersections? Regardless, if I try to batch process the job via apoc.periodic.iterate, it seems to crash the neo4j server (nothing obvious in the logs, just the cypher executed followed by
β¦ in separate thread
and then
INFO [o.n.g.f.GraphDatabaseFacadeFactory] Shutdown started
Any ideas on how to execute this across all matching nodes? I've tried invoking the procedure using the different parameters given as an example on the repo readme:
CALL spatial.osm.routeIntersection(x,false,false,false)
but get the same result. I've even tried running
CALL spatial.osm.routeIntersection(x,true,true,true)
which according to the docs creates the relationship minus the distance property, but that too causes a server crash if run for more than 100 nodes.
Any help appreciated, thanks!
β05-17-2019 09:30 AM
In the presentation I showed versions of the queries that had LIMIT
in them and did not use apoc.periodic.iterate only because they were nicer to show visually, but in building the graph I certainly used the periodic.iterate versions all the time, as you suspected.
The symptoms you describe sound like it is likely you are running out of memory. I know I needed to tweak memory settings to make the most of my RAM, but also the apoc.periodic.iterate settings were important to get the best performance and memory usage. I don't have records of the exact tweaking I did, but I do have a copy of the notes I took for the queries I ran:
Here are the queries relevant to building the routing graph:
//
// Identify (:OSMNode) instances that are intersections (connected INDIRECTLY to more than one (:OSMWayNode) and on ways or relations that are also streets.
//
MATCH (n:OSMNode)
WHERE size((n)<-[:NODE]-(:OSMWayNode)-[:NEXT]-(:OSMWayNode)) > 2
AND NOT (n:Intersection)
WITH n LIMIT 100
MATCH (n)<-[:NODE]-(wn:OSMWayNode), (wn)<-[:NEXT*0..100]-(wx),
(wx)<-[:FIRST_NODE]-(w:OSMWay)-[:TAGS]->(wt:OSMTags)
WHERE exists(wt.highway) AND NOT n:Intersection
SET n:Intersection
RETURN COUNT(*);
// Periodic iterate
CALL apoc.periodic.iterate(
'MATCH (n:OSMNode) WHERE NOT (n:Intersection)
AND size((n)<-[:NODE]-(:OSMWayNode)-[:NEXT]-(:OSMWayNode)) > 2 RETURN n',
'MATCH (n)<-[:NODE]-(wn:OSMWayNode), (wn)<-[:NEXT*0..100]-(wx),
(wx)<-[:FIRST_NODE]-(w:OSMWay)-[:TAGS]->(wt:OSMTags)
WHERE exists(wt.highway) AND NOT n:Intersection
SET n:Intersection',
{batchSize:10000, parallel:true});
MATCH (i:OSMNode) RETURN 'OSM Nodes' AS type, count(i)
UNION
MATCH (i:OSMPathNode) RETURN 'Nodes on paths' AS type, count(i)
UNION
MATCH (i:PointOfInterest) RETURN 'Points of interest' AS type, count(i)
UNION
MATCH (i:Intersection) RETURN 'Intersections' AS type, count(i);
// Produced 50k intersections in 185s for NY
// US-NE took 45 minutes to produce 789505
// San Francisco took 16s to process nodes Intersections
// San Francisco
//ββββββββββββββββββββββ€βββββββββββ
//β"type" β"count(i)"β
//ββββββββββββββββββββββͺβββββββββββ‘
//β"OSM Nodes" β2880804 β
//ββββββββββββββββββββββΌβββββββββββ€
//β"Nodes on paths" β235730 β
//ββββββββββββββββββββββΌβββββββββββ€
//β"Points of interest"β3124 β
//ββββββββββββββββββββββΌβββββββββββ€
//β"Intersections" β53744 β
//ββββββββββββββββββββββ΄βββββββββββ
//
// Find and connect intersections into routes
//
MATCH (x:Intersection) WITH x LIMIT 100
CALL spatial.osm.routeIntersection(x,true,false,false)
YIELD fromNode, toNode, fromRel, toRel, distance, length, count
WITH fromNode, toNode, fromRel, toRel, distance, length, count
MERGE (fromNode)-[r:ROUTE {fromRel:id(fromRel),toRel:id(toRel)}]->(toNode)
ON CREATE SET r.distance = distance, r.length = length, r.count = count
RETURN COUNT(*);
// With Periodic Iterate:
CALL apoc.periodic.iterate(
'MATCH (x:Intersection) RETURN x',
'CALL spatial.osm.routeIntersection(x,true,false,false)
YIELD fromNode, toNode, fromRel, toRel, distance, length, count
WITH fromNode, toNode, fromRel, toRel, distance, length, count
MERGE (fromNode)-[r:ROUTE {fromRel:id(fromRel),toRel:id(toRel)}]->(toNode)
ON CREATE SET r.distance = distance, r.length = length, r.count = count
RETURN count(*)',
{batchSize:100, parallel:false});
// San Francisco took 103s to perform 54k committed operations
// If there are errors, repeat with smaller batch size to better cope with StackOverFlow
CALL apoc.periodic.iterate(
'MATCH (x:Intersection) WHERE NOT (x)-[:ROUTE]->() RETURN x',
'CALL spatial.osm.routeIntersection(x,true,false,false)
YIELD fromNode, toNode, fromRel, toRel, distance, length, count
WITH fromNode, toNode, fromRel, toRel, distance, length, count
MERGE (fromNode)-[r:ROUTE {fromRel:id(fromRel),toRel:id(toRel)}]->(toNode)
ON CREATE SET r.distance = distance, r.length = length, r.count = count
RETURN count(*)',
{batchSize:10, parallel:false});
// Now find Routable nodes from the PointOfInterest search and link them to the route map
MATCH (x:Routable:OSMNode)
WHERE NOT (x)-[:ROUTE]->(:Intersection) WITH x LIMIT 100
CALL spatial.osm.routeIntersection(x,true,false,false)
YIELD fromNode, toNode, fromRel, toRel, distance, length, count
WITH fromNode, toNode, fromRel, toRel, distance, length, count
MERGE (fromNode)-[r:ROUTE {fromRel:id(fromRel),toRel:id(toRel)}]->(toNode)
ON CREATE SET r.distance = distance, r.length = length, r.count = count
RETURN COUNT(*);
// With periodic iterate
CALL apoc.periodic.iterate(
'MATCH (x:Routable:OSMNode)
WHERE NOT (x)-[:ROUTE]->(:Intersection) RETURN x',
'CALL spatial.osm.routeIntersection(x,true,false,false)
YIELD fromNode, toNode, fromRel, toRel, distance, length, count
WITH fromNode, toNode, fromRel, toRel, distance, length, count
MERGE (fromNode)-[r:ROUTE {fromRel:id(fromRel),toRel:id(toRel)}]->(toNode)
ON CREATE SET r.distance = distance, r.length = length, r.count = count
RETURN count(*)',
{batchSize:10, parallel:false});
// SF took 16s to do 1538 committed operations
// The algorithm makes self relationships, so delete with
MATCH (a:Intersection)-[r:ROUTE]->(a) DELETE r RETURN COUNT(*);
// SF had a 402 self relationships
// Now to get an idea of the distribution of route distances
MATCH (a:Intersection)-[r:ROUTE]->() RETURN 'All routes' AS type, COUNT(*) AS count
UNION
MATCH (a:Intersection)-[r:ROUTE]->() WHERE r.distance > 25 RETURN '>25m' AS type, COUNT(*) AS count
UNION
MATCH (a:Intersection)-[r:ROUTE]->() WHERE r.distance > 50 RETURN '>50m' AS type, COUNT(*) AS count
UNION
MATCH (a:Intersection)-[r:ROUTE]->() WHERE r.distance > 100 RETURN '>100m' AS type, COUNT(*) AS count
UNION
MATCH (a:Intersection)-[r:ROUTE]->() WHERE r.distance > 250 RETURN '>250m' AS type, COUNT(*) AS count
UNION
MATCH (a:Intersection)-[r:ROUTE]->() WHERE r.distance > 500 RETURN '>500m' AS type, COUNT(*) AS count
UNION
MATCH (a:Intersection)-[r:ROUTE]->() WHERE r.distance > 5000 RETURN '>5000m' AS type, COUNT(*) AS count;
// SF
//ββββββββββββββ€ββββββββ
//β"type" β"count"β
//ββββββββββββββͺββββββββ‘
//β"All routes"β86315 β
//ββββββββββββββΌββββββββ€
//β">25m" β55662 β
//ββββββββββββββΌββββββββ€
//β">50m" β40227 β
//ββββββββββββββΌββββββββ€
//β">100m" β18992 β
//ββββββββββββββΌββββββββ€
//β">250m" β3976 β
//ββββββββββββββΌββββββββ€
//β">500m" β1174 β
//ββββββββββββββΌββββββββ€
//β">5000m" β59 β
//ββββββββββββββ΄ββββββββ
// To improve inner-city routing we can optionally remove some of the longer ones which might be falsely detected
MATCH (a:Intersection)-[r:ROUTE]->() WHERE r.distance > 500 DELETE r RETURN COUNT(*);
β05-18-2019 08:42 PM
Many thanks Craig, will take another look at it this week and try some of your suggestions.
β05-20-2019 03:36 AM
Are there any relevant memory settings I can look at besides server page-cache and heap size? With generous settings for both that have worked with other large imports, I'm still getting the same server crash, even when I try your small batch iterator using batchSize:10 (and even batchSize:1)
β01-10-2020 05:22 AM
Hi!
Did you figure this out? Having the exact same problem when running the query:
CALL apoc.periodic.iterate(
'MATCH (x:Intersection) WHERE NOT (x)-[:ROUTE]->() RETURN x',
'CALL spatial.osm.routeIntersection(x,true,false,false)
YIELD fromNode, toNode, fromRel, toRel, distance, length, count
WITH fromNode, toNode, fromRel, toRel, distance, length, count
MERGE (fromNode)-[r:ROUTE {fromRel:id(fromRel),toRel:id(toRel)}]->(toNode)
ON CREATE SET r.distance = distance, r.length = length, r.count = count
RETURN count(*)',
{batchSize:10, parallel:false});
I've tried both with and without batching, on small and large dataset - same results everytime: the server just shuts down without any obvious hints in the logs.
β01-10-2020 08:34 PM
I wasn't able to resolve it, and unfortunately haven't looked at the project since.
Craig might be able to offer you some suggestions?
β04-29-2020 07:37 AM
Hey all - I was running into the same problem with the updated 0.2.3 importer - database crashing regularly. I dug into it for a while and realized that none of the :NEXT relationships had distance properties, which meant that the routeIntersection
procedure couldn't work. Not sure why that threw an error, but I added the distance property to all of the :NEXT relationships using the query Craig posted online here Slide 28.
That can be done before or after adding the intersection labels to the graph. I ran a couple batches without `apoc.periodical' of 1000 and it worked fine, so it's crunching now on the whole model.
Thanks everybody (especially Craig) for working on this!
β05-06-2020 08:24 AM
As @waterdoggy has pointed out, it is necessary to follow the same procedures I originally used, as the current code is not production code designed to handle all contingencies, but was built specifically for that demo. It would be great to refine and improve these utilities and procedures for much more general purpose usage, but that will take time. I have recently started porting the various spatial libraries to Neo4j 4.0 and the changes are quite large, so that will take time, but hopefully will lead to a general cleanup as well.
β04-09-2022 05:55 AM
I successfully run the Create Intersection and Distance query. If I run the intersection query with LIMIT 100 it works.
MATCH (x:Intersection) WITH x LIMIT 100
CALL spatial.osm.routeIntersection(x,true,false,false)
YIELD fromNode, toNode, fromRel, toRel, distance
WITH fromNode, toNode, fromRel, toRel, distance
MERGE (fromNode)-[r:ROUTE {fromRel:id(fromRel),toRel:id(toRel)}]->(toNode)
ON CREATE SET r.distance = distance
RETURN COUNT(*);
But it only creates a few Routes, about 330, so I tried running it with apoc.periodic.iterate (tried 10, 100, 1000, 10000 Batchsizes). This query just doesnt terminate, I waited multiple hours.
Running the following query manually a few times (I have 180.000 Intersections) the queries produce arount 300k Route relations.
MATCH (x:Intersection) WHERE NOT (x)-[:ROUTE]->() WITH x LIMIT 100
CALL spatial.osm.routeIntersection(x,true,false,false)
YIELD fromNode, toNode, fromRel, toRel, distance
WITH fromNode, toNode, fromRel, toRel, distance
MERGE (fromNode)-[r:ROUTE {fromRel:id(fromRel),toRel:id(toRel)}]->(toNode)
ON CREATE SET r.distance = distance
RETURN count(*)
But the amount of Intersections without Route relations never gets zero.
β04-10-2022 07:18 AM
After reading this https://aura.support.neo4j.com/hc/en-us/articles/1500011138861-Using-apoc-periodic-iterate-and-under... I came up with the following Query:
CALL apoc.periodic.iterate(
'MATCH (x:Intersection) RETURN id(x) as id',
'MATCH (x) WHERE id(x) = id
CALL spatial.osm.routeIntersection(x,true,false,false)
YIELD fromNode, toNode, fromRel, toRel, distance, length, count
WITH fromNode, toNode, fromRel, toRel, distance, length, count
MERGE (fromNode)-[r:ROUTE {fromRel:id(fromRel),toRel:id(toRel)}]->(toNode)
ON CREATE SET r.distance = distance, r.length = length, r.count = count
RETURN count(*)',
{batchSize:100, parallel:false});
This does the job! But I dont know if the Route Graph should only include one direction ? In my case a Graph with 300k Route relations has been created, but there are only Oneway directions.
All the sessions of the conference are now available online