I am using the following code to import the data into neo4j 3.5 (running in a docker system):
cd /sparkwiki/helpers/
target_db="neo4j"
delim="\t"
data_dir=/wikiout_german
part_template="part-\d{5}-.*.csv.gz"
neo4j-admin import \
--database=$target_db --delimiter=$delim \
--report-file=/tmp/import-wiki.log \
--id-type=INTEGER \
--nodes:Page import/page_header.csv,"$data_dir/page/normal_pages/$part_template" \
--nodes:Page:Category import/page_header.csv,"$data_dir/page/category_pages/$part_template" \
--relationships:LINKS_TO import/pagelinks_header.csv,"$data_dir/pagelinks/$part_template" \
--relationships:BELONGS_TO import/categorylinks_header.csv,"$data_dir/categorylinks/$part_template" \
--ignore-missing-nodes
There is nothing wrong with the script, because it worked fine with neo4j 4.0 (with minor adjustment of syntax).
The process of importing the data goes without any error, and I get the following output:
IMPORT DONE in 7m 7s 93ms.
Imported:
8075624 nodes
598327030 relationships
32302496 properties
Peak memory usage: 1.35 GB
There were bad entries which were skipped and logged into /tmp/import-wiki.log
However, when I try to check the nodes using neo4j browser, I don't see any nodes there. The docker-compose code snippet for neo4j is this:
neo4jwikidevde:
build:
context: ./docker
dockerfile: neo4j/Dockerfile
environment:
- NEO4J_AUTH=neo4j/test
volumes:
- data_de:/var/lib/neo4j/data
- logs_de:/logs
- import_de:/var/lib/neo4j/import
- ./wikiout_german:/wikiout_german
networks:
- internal_t2g
- external-network
ports:
- 7475:7474
- 7688:7687
Notes:
-
There were some suggestions to restart the neo4j container and check after the import is complete. I have already tried that, it didn't work.
-
The data was getting imported in the case of neo4j 4.0, but the python code is not getting connected with neo4j in the case of version 4.0. So, I am trying with 3.5 (which is working in my other codebases).
I have tried quite a few things, nothing seems to work. Any suggestions will be highly appreciated.