Hi
I do not have access to the neo4j UI console due to ports mis-configurations in kubernetes, however I am able to import files and load files from a nodes and relationships files in a db called userdb. I can also see the folder in databases dir.
However to use this when i try to use cypher-shell for creating the database it does not seem to work.
With cypher-shell "Create dababase userdb" resulting in
Upsupported administration command: create database userdb
I have searched and neo4j-admin also does not have the create command.
I would appreciate any help.
Thanks in advance
Thanks
Open your Neo4j browser and click on the database icon at top left. When clicked you will see a dropdown list of databases. In your case you will see 'neo4j' and 'system'. select 'system' . Run the query "SHOW DATABASES". This will show all the databases . Execute the Cypher query: CREATE DATABASE userdb. Once successful, rerun the SHOW DATABASES and now you should see you db name. Also check to see if the db is active. Now in the left db dropdown select your db.
./bin/neo4j-admin database import full neo4j —nodes=/import/nodes.csv —relationships=/import/edges.csv —trim-strings=true
This is the command that I have to load into the neo4j pods, but when I tried to execute it in the neo4j (which is running in kubernetes pods) it asked me to shut down the server. When I did that it kicked me out of the pods, wondering now where should I run the neo4j-admin command
If I were to use LOAD CSV which I have used in the past but requires me to understand the schema and appropriately add them into my LOAD CSV where I want to make connections etc.
However in this case this is an external dataset and please suggest how I go about loading it.
Once applied, the pod will restart but Neo4j won't be running. Instead, it will run a dummy process that outputs a message indicating it's in maintenance mode.
Then you can exec into the pod and perform your offline import.
This is what I have ended up using for loading the nodes:
LOAD CSV WITH HEADERS FROM "file:///nodes_load.csv" AS row
CALL apoc.create.node([row.LABEL],{node_id:row.node_id, node_index:row.node_index})
yield node
return count(*)
And it was quick,
However when loading the relationships which are about 5M:
Do you have indexes on START_ID and END_ID? Not having those (indexes to enable you to find the relationship ends quickly) is almost always the cause of poor performance loading relationships.
Not that I'm suggesting apoc is the cause of your performance issue but why complicate matters with apoc when this could have been without the need for apoc and as described at
I had the constraints set on all the source and target properties, so assuming that would create the indexes as well.
I let the script run over the weekend and it took about 36 hours for 130k nodes and 4.5M relationships.
Thanks for all the help
It is documented that creating property uniqueness or key constraints will create indexes as well. Having said that, it still seems pretty slow - did you try using LOAD CSV directly, without involving apoc?