Neo4j admin import functionality

I am currently using Neo4j Community version 4.3.4.
I am using apoc.import.json api to import the bulk data to the neo4j database. However, due to performance issues, I am trying to use neo4j admin import which is said to be very performance efficient.

I am using the command as below:
bin/neo4j-admin import --database=neo4j --nodes=data/WithoutHASHeader.csv,data/nodes.csv --relationships=data/WithHASHeader.csv,data/relation.csv
I have ensured to place all the required csv files inside data folder and the command works fine and I could see the response as successful.
However, when i check the neo4j browser, I do not see any nodes or relationships added.
And when i restart the neo4j, then, neo4j database becomes unavailable and its no longer accessible.
As the command works only on a newly created database, i am ensuring to delete all the database related files and then execute the command so that the command is working successfully.

As per the explaination given here:

It seems like, the command is partially available only for community version. As after executing the command, the CREATE database command is to be issued which is not available in Community version.

Hence my query is,

  1. Whether neo4j admin command is fully available in community version?
  2. How can the nodes and relationships be seen in the neo4j database after executing the command?

If it helps:
I tried to perform the same example as explained in Neo4j Admin import - Operations Manual.
With this, the command exits with the below success message:
IMPORT DONE in 1s 410ms.
Imported:
6 nodes
9 relationships
24 properties
Peak memory usage: 1.004GiB

After that, on restarting the neo4j database, in the neo4j browser, the neo4j database is not selected by default as it always will be.
and when tried to select neo4j database using the command ":use neo4j", the below error is observed.
"Database "neo4j" is unavailable, its status is "offline"."

The result of the command - "show databases" is as below:

name address role requestedStatus currentStatus error default home
neo4j localhost:7687 standalone online offline An error occurred! Unable to start DatabaseId{ff89d9fc[neo4j]}. true true
system localhost:7687 standalone online online false false

@sumagowrishan

does your logs/debug.log provide details of why the database failed to come online?

I did not check the logs. I will check it and update the results.
However, my doubt is will admin-import work fine on community version?
I am not able to stop the database or start it using STOP/START DATABASE neo4j command.
This results in unauthorized admin command error.
So my doubt is, will the command work fine without stopping the database?

debug.txt (268.4 KB)
Here is the log file.
In case if it helps:
As the import command does not work on a running database, and currently I am using the community edition, I am not able to stop the database using STOP Command. Hence I am deleting the neo4j database content using rm -rf data/database/neo4j/* command.
After this, I am executing the admin import command.
I am not sure if the above rm -rf command is causing the issue.

@sumagowrishan
not related to the failure so far, but the debg.log you attach includes

2022-01-20 05:00:19.335+0000 WARN  [o.n.k.i.JvmChecker] The max heap memory has not been configured. It is recommended that it is always explicitly configured, to ensure the system has a balanced configuration. Until then, a JVM computed heuristic of 6643777536 bytes is used instead. If you are running neo4j server, you need to configure dbms.memory.heap.max_size in neo4j.conf. If you are running neo4j embedded, you have to launch the JVM with -Xmx set to a value. You can run neo4j-admin memrec for memory configuration suggestions.
2022-01-20 05:00:19.338+0000 WARN  [o.n.k.i.JvmChecker] The initial heap memory has not been configured. It is recommended that it is always explicitly configured, to ensure the system has a balanced configuration. Until then, a JVM computed heuristic of 415236096 bytes is used instead. If you are running neo4j server, you need to configure dbms.memory.heap.initial_size in neo4j.conf. If you are running neo4j embedded, you have to launch the JVM with -Xms set to a value. You can run neo4j-admin memrec for memory configuration suggestions.

are there plans to address these warning?

Per Import - Operations Manual

However, using the import command of neo4j-admin is generally faster since it is run against a stopped and empty database. This section describes the neo4j-admin import option. For information on LOAD CSV, see the Cypher Manual → LOAD CSV.

and the key pieces of above is
a. the database must be empty
b. the database must be stopped

You cant simply rm data/databases/<database> files.
Your debg.log reports

2022-01-20 08:30:50.579+0000 ERROR [o.n.d.d.DefaultDatabaseManager] Failed to start DatabaseId{d16d0e34[neo4j]}
org.neo4j.dbms.api.DatabaseManagementException: An error occurred! Unable to start `DatabaseId{d16d0e34[neo4j]}`.
	at org.neo4j.dbms.database.AbstractDatabaseManager.startDatabase(AbstractDatabaseManager.java:192) ~[neo4j-4.3.4.jar:4.3.4]
	at org.neo4j.dbms.database.DefaultDatabaseManager.startDatabase(DefaultDatabaseManager.java:156) ~[neo4j-4.3.4.jar:4.3.4]
	at org.neo4j.dbms.database.DefaultDatabaseManager.initialiseDefaultDatabase(DefaultDatabaseManager.java:67) ~[neo4j-4.3.4.jar:4.3.4]
	at org.neo4j.dbms.database.DefaultDatabaseInitializer.start0(DefaultDatabaseInitializer.java:39) ~[neo4j-kernel-4.3.4.jar:4.3.4]
	at org.neo4j.kernel.lifecycle.SafeLifecycle.transition(SafeLifecycle.java:124) [neo4j-common-4.3.4.jar:4.3.4]
	at org.neo4j.kernel.lifecycle.SafeLifecycle.start(SafeLifecycle.java:138) [neo4j-common-4.3.4.jar:4.3.4]
	at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:442) [neo4j-common-4.3.4.jar:4.3.4]
	at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:110) [neo4j-common-4.3.4.jar:4.3.4]
	at org.neo4j.graphdb.facade.DatabaseManagementServiceFactory.startDatabaseServer(DatabaseManagementServiceFactory.java:205) [neo4j-4.3.4.jar:4.3.4]
	at org.neo4j.graphdb.facade.DatabaseManagementServiceFactory.build(DatabaseManagementServiceFactory.java:170) [neo4j-4.3.4.jar:4.3.4]
	at org.neo4j.server.CommunityBootstrapper.createNeo(CommunityBootstrapper.java:36) [neo4j-4.3.4.jar:4.3.4]
	at org.neo4j.server.NeoBootstrapper.start(NeoBootstrapper.java:134) [neo4j-4.3.4.jar:4.3.4]
	at org.neo4j.server.NeoBootstrapper.start(NeoBootstrapper.java:90) [neo4j-4.3.4.jar:4.3.4]
	at org.neo4j.server.CommunityEntryPoint.main(CommunityEntryPoint.java:34) [neo4j-4.3.4.jar:4.3.4]
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Mismatching store id. Store StoreId: StoreId{creationTime=1642667351318, randomId=-1614129228406106886, storeVersion=3471768636287762695, upgradeTime=1642667351318, upgradeTxId=1}. Transaction log StoreId: StoreId{creationTime=1642592237908, randomId=-2040892648800860919, storeVersion=3471768636287762695, upgradeTime=1642592237908, upgradeTxId=1}
	at org.neo4j.kernel.database.Database.handleStartupFailure(Database.java:626) ~[neo4j-kernel-4.3.4.jar:4.3.4]
	at org.neo4j.kernel.database.Database.start(Database.java:523) ~[neo4j-kernel-4.3.4.jar:4.3.4]
	at org.neo4j.dbms.database.AbstractDatabaseManager.startDatabase(AbstractDatabaseManager.java:188) ~[neo4j-4.3.4.jar:4.3.4]
	... 13 more
Caused by: java.lang.RuntimeException: Mismatching store id. Store StoreId: StoreId{creationTime=1642667351318, randomId=-1614129228406106886, storeVersion=3471768636287762695, upgradeTime=1642667351318, upgradeTxId=1}. Transaction log StoreId: StoreId{creationTime=1642592237908, randomId=-2040892648800860919, storeVersion=3471768636287762695, upgradeTime=1642592237908, upgradeTxId=1}
	at org.neo4j.kernel.recovery.Recovery.validateStoreId(Recovery.java:408) ~[neo4j-kernel-4.3.4.jar:4.3.4]
	at org.neo4j.kernel.database.Database.checkStoreId(Database.java:633) ~[neo4j-kernel-4.3.4.jar:4.3.4]
	at org.neo4j.kernel.database.Database.validateStoreAndTxLogs(Database.java:596) ~[neo4j-kernel-4.3.4.jar:4.3.4]
	at org.neo4j.kernel.database.Database.start(Database.java:417) ~[neo4j-kernel-4.3.4.jar:4.3.4]
	at org.neo4j.dbms.database.AbstractDatabaseManager.startDatabase(AbstractDatabaseManager.java:188) ~[neo4j-4.3.4.jar:4.3.4]
	... 13 more
2022-01-20 08:30:50.618+0000 INFO  [o.n.b.BoltServer] Bolt enabled on 0.0.0.0:7687.

suggesting there is issue wth the transactions logs.

To rerun neo4j-admin import and use an existing database please rerun neo4j-admn import and include argument --force=true for which the help text reports


      --force[=<true/false>] Force will delete any existing database files
                               prior to the import.
                               Default: false

Thank you very much for the detailed response.
Regarding two of the above mentioned items:

  1. the database must be stopped: This condition can not be met as I am currently using Neo4j Community edition and in this, execution of STOP DATABASE neo4j results in unauthorized command errror.
  2. Regarding the --force command, I observed this suggestion in couple of sites and I tried to use the same. However, I get an error saying no such flag.

@sumagowrishan

regarding

the database must be stopped: This condition can not be met as I am currently using Neo4j Community edition and in this, execution of STOP DATABASE neo4j results in unauthorized command errror.

as Community edition only allows one 'user' database. Simply stop the entire Neo4j process similar to bin/neo4j stop

and with reference to

--force=true is a valid argument for neo4j-admin import.

I'm confused as initially you reported

I am using the command as below:
bin/neo4j-admin import --database=neo4j --nodes=data/WithoutHASHeader.csv,data/nodes.csv --relationships=data/WithHASHeader.csv,data/relation.csv

bur from you screenshot your are now using neo4j-admin load ?

I am extremely sorry for the wrong image explaining the command flag usage.

root@3995e74e1666:/var/lib/neo4j# bin/neo4j-admin import --nodes data/movies.csv --nodes data/actors.csv --relationships data/roles.csv --force=true
Selecting JVM - Version:11.0.13, Name:OpenJDK 64-Bit Server VM, Vendor:Oracle Corporation
Unknown option: '--force=true'

USAGE

neo4j-admin import [--expand-commands] [--verbose] [--cache-on-heap
[=<true/false>]] [--high-io[=<true/false>]]
[--ignore-empty-strings[=<true/false>]]
[--ignore-extra-columns[=<true/false>]]
[--legacy-style-quoting[=<true/false>]] [--multiline-fields
[=<true/false>]] [--normalize-types[=<true/false>]]
[--skip-bad-entries-logging[=<true/false>]]
[--skip-bad-relationships[=<true/false>]]
[--skip-duplicate-nodes[=<true/false>]] [--trim-strings
[=<true/false>]] [--additional-config=]
[--array-delimiter=] [--bad-tolerance=]
[--database=] [--delimiter=]
[--id-type=<STRING|INTEGER|ACTUAL>]
[--input-encoding=] [--max-memory=]
[--processors=] [--quote=]
[--read-buffer-size=] [--report-file=] --nodes=
[[:]...=]... [--nodes=[[:
]...=]...]... [--relationships=[=]
...]...

@sumagowrishan

apologies as --force=true was added in 4.3.8 Neo4j 4.3 changelog · neo4j/neo4j Wiki · GitHub

your options are either
a. upgrade to 4.3.8 or later
or
b. stop Neo4j

        rm -rf data/databases/<databaseName>
        rm -rf data/transactions/<databaseName>

replacing <databaseName> with the actual name of the database

Thank you @dana.canzano .
With the above set of commands, I am able to successfully import the data.
Once again thank you for all the information provided.

@dana.canzano ,
I have one follow up query here.
I am able to import successfully with this. However, I am not able to see the nodes or relationships.
Could you please let me know if I am missing something in this steps.

@sumagowrishan

are you able to connect to the database? and upon connecting its simply empty?

Yes, exactly...
I am performing below operations:

echo "neo4j operations"
cd /var/lib/neo4j
bin/neo4j stop
rm -rf data/databases/neo4j/
rm -rf data/transactions/neo4j/
bin/neo4j-admin import --database=neo4j --nodes=/data/nodeheader.csv,/data/nodes.csv --relationships=/data/relationheader.csv,/data/relationships.csv > import.log
tail -6 import.log

and then restarting the neo4j process as below:

echo "Import is done"
NEO4JPID=$(ps -ef | awk '$8=="/jdk/bin/java" {print $2}')

if [ -z $NEO4JPID ]
then
    echo "No neo4j process"
else
	echo "$NEO4JPID is the neo4j pid"
	kill $NEO4JPID
	sleep 10
	echo "after sleep of ten restarting neo4j"
fi

/neo4j-entrypoint.sh > neo4jLog &

And content of neo4j_entrypoint.sh is as below:

#!/bin/bash
cd /var/lib/neo4j

#bin/neo4j console
bin/neo4j start

After all these, when i refresh the neo4j browser, i am not seeing any data.

@sumagowrishan

  1. rather than
bin/neo4j stop
rm -rf data/databases/neo4j/
rm -rf data/transactions/neo4j/

why not simply run cypher statement

drop database neo4j;

and then

neo4j stop

and then post

neo4j-admin import .... .....
neo4j start

and then cypher command

create database neo4j;

Drop and Create are not supported as I am using Community Edition.