OSM import stalls out after Nodes are complete

I was able to get the sample data to import as shown in the OSM github. However, when attempting to import a larger data set (North America ~15GB) using the following:

java -Xms4096m -Xmx4096m   -cp "target/osm-0.2.2-neo4j-3.5.1.jar:tndency/*" org.neo4j.gis.osm.OSMImportTool   --skip-duplicate-nodes --delete --into target/databases/northamerica samples/north-america-latest.osm.bz2

The import seems to stall out after completing the Nodes import. Should I increase my memory allocation? I currently allocated 4GB each as seen at the beginning of the terminal command. Or should I be more patient and just wait considering the size of import?

It's been a while since I imported very large datasets, but I'm pretty sure I succeeded in importing NA and did need to allocate a lot of memory, so that might be your problem. However, I don't think it is the Java heap that matters so much as the additional non-heap memory you can allocate with the max-memory arguement. To see all options try:

java -Xms1280m -Xmx1280m -cp "target/osm-0.2.2-neo4j-3.5.1.jar:target/dependency/*" org.neo4j.gis.osm.OSMImportTool -h

The options that are relevant to memory are:

--max-memory <max memory that importer can use>
	(advanced) Maximum memory that importer can use for various data structures and 
	caching to improve performance. If left as unspecified (null) it is set to 90% 
	of (free memory on machine - max JVM memory). Values can be plain numbers, like 
	10000000 or e.g. 20G for 20 gigabyte, or even e.g. 70%.
--cache-on-heap Whether or not to allow allocating memory for the cache on heap
	(advanced) Whether or not to allow allocating memory for the cache on heap. If 
	'false' then caches will still be allocated off-heap, but the additional free 
	memory inside the JVM will not be allocated for the caches. This to be able to 
	have better control over the heap memory. Default value: false

I would suggest try a few options. I needed to increase my memory a lot to get some of the biggest datasets to load. I even allocated more swap memory off my SSD for one dataset to make it work.

With good memory settings how long should an import if this size take? I want to make sure I adjust my expectations

I am still struggling with controlling memory to let the import complete. If anyone has some extra java guidance I am open. Java is a bit outside my wheelhouse

I just noticed that i can see my available storage keep dropping

image

while the terminal is at this stage of processing

I have watched the available storage on my computer continue to drop. Something seems amiss

If I remember correctly, after the first phase (the 1/4 you see above) is completed, the system builds a bunch of ID mapping tables, which seems to take a long time before it gets to the next phase where it will print progress. That unlogged phase does seem to allocate a lot of memory. Although the OSM importer was built by me, it is based very much on the 'batch importer' framework which does all the clever internal magic. I think there is online help on how to use the batch importer (previously called neo4j-import and later neo4j-admin import), so it might be worth asking general questions about that for good advice on memory settings.

My own tests involved allocating a LOT of ram and disk space. I put a 4TB SSD into my laptop, and allocated 200GB swap space for virtual RAM, and still I managed to fill the entire 4TB with an attempted import of the planet.osm. I did import many smaller files, but don't think the one you are trying was on the list. Perhaps I get a chance to try again sometime, but this is my daily use laptop, and big imports like this can take days, so I don't want to lock it up for that long.

I notice that the output for estimated usage looks way off. I think there are some scale factors in the tool that can be tweaked to make that more reasonable.

Ok. Thanks for the feedback. Ended up stopping that last attempt to prevent eating up all my storage and locking up the computer. I am unfamiliar with allocating swap space for virtual RAM. I may not have enough extra storage to complete the entire North America file in one go.

In the mean time I think I might just download the individual state files and work with those.