Neo4j-admin import uses only one cpu core after a while

wallerprogramm · March 1, 2019, 5:18pm

I'm currently importing nodes and relationships from different csv files with a total size of ~45GB using neo4j-admin import. In the beginning all 4 cpu cores were used but from (at least) 55% of (1/4) Node import only one core is used. It is running now for over 16 hours and is still at 60%. You can see that in the following console output:

I use Neo4j version: 3.5.2 (Neo4j Desktop Version 1.1.15).

I started the import with the following command:

./bin/neo4j-admin import \
	--mode=csv \
	--database=btctest.db \
	--nodes $HEADERS/addresses-header.csv,$DATA/addresses.csv \
	--nodes $HEADERS/blocks-header.csv,$DATA/blocks.csv \
	--nodes $HEADERS/transactions-header.csv,$DATA/transactions.csv \
	--relationships $HEADERS/before_rel-header.csv,$DATA/before_rel.csv \
	--relationships $HEADERS/belongs_to_rel-header.csv,$DATA/belongs_to_rel.csv \
	--relationships $HEADERS/receives_rel-header.csv,$DATA/receives_rel.csv \
	--relationships $HEADERS/sends_rel-header.csv,$DATA/sends_rel.csv \
	--ignore-missing-nodes=true \
	--ignore-duplicate-nodes=true \
	--multiline-fields=true \
	--high-io=true

The headers are the following way (as an example I post one node header and one relationship header):
node (transactions-header.csv):

txid:ID,:LABEL

relationship (sends_rel-header.csv):

:START_ID,value,:END_ID,:TYPE

Is it normal that neo4j uses only one cpu core after a while? And is it normal that the import with the import tool takes that long? Do you have any recommendations on how to make this faster? By the way I use SSD.

michael.hunger · March 2, 2019, 1:51am

Hmm usually it is quite effiicient.
Do you by chance have a lot of duplicate nodes in your data?

Btw. you can pass the main label/rel-type directly on the command line.
You could try to configure less heap (e.g. 2G) and reserve the rest to the heap.

can you get the "c" and "i" outputs in that stage?

wallerprogramm · March 2, 2019, 4:07pm

Only the addresses.csv can contain duplicates. I try to remove the duplicates before importing the data.
Yes, I realised that just after I've started generating the csv files.
"c" and "i" work in that stage. This is the output ("i"):

It seems that the index creation is the problem. Might this be because I use quite large ids (64 characters)? If this would be the case, then I should probably rethink whether I really need an index on these ids and introduce smaller ids as actual ids and add the 64 character ids just as unindexed properties.
Is it true that the index preparation cannot be done in parrallel? This would explain why only one core is used in that stage of import.

michael.hunger · March 3, 2019, 11:00am

It does not create an index at this stage. It's probably just the de-duplication.

I'll ask the devs about it, this is really not what it should look like, your whole import should be done in a few minutes.

Also if you look at your memory information it seems there is not much available.

michael.hunger · March 4, 2019, 8:19am

Answer from the team.

there are some special cases where the sorting in there isn't particularly optimal and only one thread gets the majority of the work.
So the solution for him would be to de-duplicate upfront with unix tools for the time being.

Sorry for that, it's something we're going to address going forward.

wallerprogramm · March 4, 2019, 11:29am

Removing duplicates in advance did it. The import took now less than 30 minutes. Thank you for your help.

Topic		Replies	Views
Is the admin bulk import faster for zipped csv? Import / Export import , neo4j-admin	5	377	April 5, 2023
Import file that contains many duplicates never finishes Import / Export	2	1810	October 8, 2018
Neo4j-admin import can be run multiple times? Import / Export	8	631	June 14, 2020
Neo4j Import error- There is insufficient memory for the Java Runtime Environment to continue. - 2.3 TB dataset Import / Export performance , neo4j-import , cloud	8	3634	November 8, 2018
Neo4j admin is stuck when bulk importing Import / Export performance	0	33	April 24, 2025

August Summer Fun!

Neo4j-admin import uses only one cpu core after a while

Related topics