Hello to the Neo4j community. I have started developing a OSM-based neo4j project a few months ago. All ran well on a small 40k-node test graph but I ran into difficulties scaling it to the large production graph with 500million nodes. Please see the details below.
I have imported OpenStreetMap data using the OSM importer tool (GitHub - neo4j-contrib/osm: OSM Data Model for Neo4j). I tweaked the importer not to create OSMWayNodes, just OSMNodes. The OSMNodes are chained with NEXT relationships and the first node of a way is linked to the OSMWay node through a FIRST_NODE relationship.
Next I calculated and added the 'distance' property to the NEXT relationship with this query:
call apoc.periodic.commit(
'MATCH (an:OSMNode)-[r:NEXT]->(bn:OSMNode)
WHERE NOT exists(r.distance)
WITH an, bn, r LIMIT $limit
SET r.distance = distance(an.location, bn.location)
RETURN COUNT(*)',
{limit: 10000}
);
In a small test graph with 40k nodes the query runs perfectly and creates the desired result. In a large database of 500million nodes (the final production database), the query ran for 36 hours with not sign of finishing the job. 49 neostore.transaction.db.X files of size 263.3MB were created as a result.
There was no indexing on the nodes in the graph. I plan to add indexing on the OSMNode.location property for finding nearest nodes to an input set of coordinates (tested it and worked in the small graph).
I am running Neo4j v4.1.3, desktop v1.3.11, browser v4.2.0.
Attached a screenshot of the profiling result. Profiling was run on the small database, because I could delete and recreate the distance properties on NEXT relationships easily in the small graph.
My issue is that the query does not scale to a large 500million DB. I would like to know if I need to change the query, the graph structure or the development environment, and what changes I should do to run this query and scale the production graph.
Please let me know if I can give any more data or clarifications. Thanks a lot for the help.
Tiha