Hi there!
I am trying to import a turtle dataset of just under 700MB containing roughly 2M nodes.
I am doing it into a community edition of the Neo4J database started as follows in docker:
docker run \
--name neo_realm_neo4j \
-p7474:7474 -p7687:7687 \
--add-host host.docker.internal:host-gateway \
-d \
--env NEO4JLABS_PLUGINS='["apoc", "n10s"]' \
--env NEO4J_AUTH=neo4j/test \
--env NEO4J_dbms_unmanaged__extension__classes="n10s.endpoint=/rdf" \
--env NEO4J_dbms_memory_heap_initial__size=2G \
--env NEO4J_dbms_memory_heap_max__size=55G \
-v $HOME/neo4j/data:/data \
-v $HOME/neo4j/logs:/logs \
-v $HOME/neo4j/import:/var/lib/neo4j/import \
-v $HOME/neo4j/plugins:/plugins \
neo4j:latest
with the following Cypher query
CALL n10s.rdf.import.fetch('{url}', 'Turtle')
(where URL is a docker accessible URL returning the turtle file)
As is evident the container is being given 55GB of RAM.
This process takes all RAM, and even slows down importing towards the end. Upon restarting the docker image everything is persisted, but only takes up 35GB of space.
Looking around, it seems like a 2M node dataset is rather small compared to what other people work with.
Are there any ways to go with optimizations in order to make this a bit more manageable?