Preparing OSM data for routing

I've been trying to follow the GraphConnect 2018 video on loading OSM data into a routable graph (https://neo4j.com/graphconnect-2018/session/neo4j-spatial-mapping) - all goes well until I try the cypher shown at 21:26. If I run the cypher exactly as shown (including 'LIMIT 100' on the match), won’t that only setup a [:ROUTE] relationship for 100 intersections? Regardless, if I try to batch process the job via apoc.periodic.iterate, it seems to crash the neo4j server (nothing obvious in the logs, just the cypher executed followed by

… in separate thread

and then

INFO [o.n.g.f.GraphDatabaseFacadeFactory] Shutdown started

Any ideas on how to execute this across all matching nodes? I've tried invoking the procedure using the different parameters given as an example on the repo readme:

CALL spatial.osm.routeIntersection(x,false,false,false)

but get the same result. I've even tried running

CALL spatial.osm.routeIntersection(x,true,true,true)

which according to the docs creates the relationship minus the distance property, but that too causes a server crash if run for more than 100 nodes.

Any help appreciated, thanks!

1 Like

In the presentation I showed versions of the queries that had LIMIT in them and did not use apoc.periodic.iterate only because they were nicer to show visually, but in building the graph I certainly used the periodic.iterate versions all the time, as you suspected.

The symptoms you describe sound like it is likely you are running out of memory. I know I needed to tweak memory settings to make the most of my RAM, but also the apoc.periodic.iterate settings were important to get the best performance and memory usage. I don't have records of the exact tweaking I did, but I do have a copy of the notes I took for the queries I ran:

Here are the queries relevant to building the routing graph:

//
// Identify (:OSMNode) instances that are intersections (connected INDIRECTLY to more than one (:OSMWayNode) and on ways or relations that are also streets.
//

MATCH (n:OSMNode)
  WHERE size((n)<-[:NODE]-(:OSMWayNode)-[:NEXT]-(:OSMWayNode)) > 2
  AND NOT (n:Intersection)
WITH n LIMIT 100
MATCH (n)<-[:NODE]-(wn:OSMWayNode), (wn)<-[:NEXT*0..100]-(wx),
      (wx)<-[:FIRST_NODE]-(w:OSMWay)-[:TAGS]->(wt:OSMTags)
  WHERE exists(wt.highway) AND NOT n:Intersection
SET n:Intersection
RETURN COUNT(*);

// Periodic iterate

CALL apoc.periodic.iterate(
'MATCH (n:OSMNode) WHERE NOT (n:Intersection)
 AND size((n)<-[:NODE]-(:OSMWayNode)-[:NEXT]-(:OSMWayNode)) > 2 RETURN n',
'MATCH (n)<-[:NODE]-(wn:OSMWayNode), (wn)<-[:NEXT*0..100]-(wx),
       (wx)<-[:FIRST_NODE]-(w:OSMWay)-[:TAGS]->(wt:OSMTags)
   WHERE exists(wt.highway) AND NOT n:Intersection
 SET n:Intersection',
{batchSize:10000, parallel:true});

MATCH (i:OSMNode) RETURN 'OSM Nodes' AS type, count(i)
UNION
MATCH (i:OSMPathNode) RETURN 'Nodes on paths' AS type, count(i)
UNION
MATCH (i:PointOfInterest) RETURN 'Points of interest' AS type, count(i)
UNION
MATCH (i:Intersection) RETURN 'Intersections' AS type, count(i);


// Produced 50k intersections in 185s for NY
// US-NE took 45 minutes to produce 789505
// San Francisco took 16s to process nodes Intersections

// San Francisco
//╒════════════════════╀══════════╕
//β”‚"type"              β”‚"count(i)"β”‚
//β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ══════════║
//β”‚"OSM Nodes"         β”‚2880804   β”‚
//β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
//β”‚"Nodes on paths"    β”‚235730    β”‚
//β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
//β”‚"Points of interest"β”‚3124      β”‚
//β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
//β”‚"Intersections"     β”‚53744     β”‚
//β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

//
// Find and connect intersections into routes
//

MATCH (x:Intersection) WITH x LIMIT 100
  CALL spatial.osm.routeIntersection(x,true,false,false)
  YIELD fromNode, toNode, fromRel, toRel, distance, length, count
WITH fromNode, toNode, fromRel, toRel, distance, length, count
MERGE (fromNode)-[r:ROUTE {fromRel:id(fromRel),toRel:id(toRel)}]->(toNode)
  ON CREATE SET r.distance = distance, r.length = length, r.count = count
RETURN COUNT(*);

// With Periodic Iterate:

CALL apoc.periodic.iterate(
'MATCH (x:Intersection) RETURN x',
'CALL spatial.osm.routeIntersection(x,true,false,false)
   YIELD fromNode, toNode, fromRel, toRel, distance, length, count
 WITH fromNode, toNode, fromRel, toRel, distance, length, count
 MERGE (fromNode)-[r:ROUTE {fromRel:id(fromRel),toRel:id(toRel)}]->(toNode)
   ON CREATE SET r.distance = distance, r.length = length, r.count = count
 RETURN count(*)',
{batchSize:100, parallel:false});

// San Francisco took 103s to perform 54k committed operations

// If there are errors, repeat with smaller batch size to better cope with StackOverFlow

CALL apoc.periodic.iterate(
'MATCH (x:Intersection) WHERE NOT (x)-[:ROUTE]->() RETURN x',
'CALL spatial.osm.routeIntersection(x,true,false,false)
   YIELD fromNode, toNode, fromRel, toRel, distance, length, count
 WITH fromNode, toNode, fromRel, toRel, distance, length, count
 MERGE (fromNode)-[r:ROUTE {fromRel:id(fromRel),toRel:id(toRel)}]->(toNode)
   ON CREATE SET r.distance = distance, r.length = length, r.count = count
 RETURN count(*)',
{batchSize:10, parallel:false});

// Now find Routable nodes from the PointOfInterest search and link them to the route map

MATCH (x:Routable:OSMNode)
  WHERE NOT (x)-[:ROUTE]->(:Intersection) WITH x LIMIT 100
CALL spatial.osm.routeIntersection(x,true,false,false)
  YIELD fromNode, toNode, fromRel, toRel, distance, length, count
WITH fromNode, toNode, fromRel, toRel, distance, length, count
MERGE (fromNode)-[r:ROUTE {fromRel:id(fromRel),toRel:id(toRel)}]->(toNode)
  ON CREATE SET r.distance = distance, r.length = length, r.count = count
RETURN COUNT(*);

// With periodic iterate

CALL apoc.periodic.iterate(
'MATCH (x:Routable:OSMNode)
   WHERE NOT (x)-[:ROUTE]->(:Intersection) RETURN x',
'CALL spatial.osm.routeIntersection(x,true,false,false)
   YIELD fromNode, toNode, fromRel, toRel, distance, length, count
 WITH fromNode, toNode, fromRel, toRel, distance, length, count
 MERGE (fromNode)-[r:ROUTE {fromRel:id(fromRel),toRel:id(toRel)}]->(toNode)
   ON CREATE SET r.distance = distance, r.length = length, r.count = count
 RETURN count(*)',
{batchSize:10, parallel:false});

// SF took 16s to do 1538 committed operations

// The algorithm makes self relationships, so delete with

MATCH (a:Intersection)-[r:ROUTE]->(a) DELETE r RETURN COUNT(*);

// SF had a 402 self relationships

// Now to get an idea of the distribution of route distances

MATCH (a:Intersection)-[r:ROUTE]->() RETURN 'All routes' AS type, COUNT(*) AS count
UNION
MATCH (a:Intersection)-[r:ROUTE]->() WHERE r.distance > 25 RETURN '>25m' AS type, COUNT(*) AS count
UNION
MATCH (a:Intersection)-[r:ROUTE]->() WHERE r.distance > 50 RETURN '>50m' AS type, COUNT(*) AS count
UNION
MATCH (a:Intersection)-[r:ROUTE]->() WHERE r.distance > 100 RETURN '>100m' AS type, COUNT(*) AS count
UNION
MATCH (a:Intersection)-[r:ROUTE]->() WHERE r.distance > 250 RETURN '>250m' AS type, COUNT(*) AS count
UNION
MATCH (a:Intersection)-[r:ROUTE]->() WHERE r.distance > 500 RETURN '>500m' AS type, COUNT(*) AS count
UNION
MATCH (a:Intersection)-[r:ROUTE]->() WHERE r.distance > 5000 RETURN '>5000m' AS type, COUNT(*) AS count;

// SF
//╒════════════╀═══════╕
//β”‚"type"      β”‚"count"β”‚
//β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•ͺ═══════║
//β”‚"All routes"β”‚86315  β”‚
//β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
//β”‚">25m"      β”‚55662  β”‚
//β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
//β”‚">50m"      β”‚40227  β”‚
//β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
//β”‚">100m"     β”‚18992  β”‚
//β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
//β”‚">250m"     β”‚3976   β”‚
//β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
//β”‚">500m"     β”‚1174   β”‚
//β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€
//β”‚">5000m"    β”‚59     β”‚
//β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”˜

// To improve inner-city routing we can optionally remove some of the longer ones which might be falsely detected

MATCH (a:Intersection)-[r:ROUTE]->() WHERE r.distance > 500 DELETE r RETURN COUNT(*);

Many thanks Craig, will take another look at it this week and try some of your suggestions.

Are there any relevant memory settings I can look at besides server page-cache and heap size? With generous settings for both that have worked with other large imports, I'm still getting the same server crash, even when I try your small batch iterator using batchSize:10 (and even batchSize:1)