cancel
Showing results for 
Search instead for 
Did you mean: 

Has the neo4j-admin import tool changed in recent versions?

inersha
Node Clone

I've recently upgraded from 4.3.3 to 4.4.2.

On 4.3.3 I used the neo4j-admin import tool to import some data on a fresh database. However, since upgrading to 4.4.2 and using the same tool on the same CSV files I am getting incorrect data being inserted in to Neo4j.

I can't pinpoint what's going on exactly, but I'm getting the wrong data in some fields, and duplicate nodes where there weren't any before.

I downgraded to 4.3.3 and the import tool works as expected, so I'm pretty sure it's to do with one of the newer versions.

I haven't got any isolated examples as there is quite a lot of data being imported, but before I go digging, have there been any major changes in the way the neo4j-admin import tool parses CSV files in recent versions that may be causing the data to be imported incorrectly (whereas it would have worked fine in 4.3.3)?

Thanks in advance for any advice!

1 ACCEPTED SOLUTION

inersha
Node Clone

It appears that this is a known issue: neo4j-admin import generated duplicate id's after upgrade to 4.4 · Issue #12793 · neo4j/neo4j · GitH...

"I found the issue and a fix for it. It should be included in the next 4.4.x release."

View solution in original post

2 REPLIES 2

inersha
Node Clone

I think the problem is to do with the values of the :ID nodes.

My CSV file looks like this:

hash:ID,height:INT,version:INT,prevblock:STRING
000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f,0,1,0000000000000000000000000000000000000000000000000000000000000000
00000000839a8e6886ab5951d76f411475428afc90947ee320161bbf18eb6048,1,1,000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f
000000006a625f06636b8bb6ac7b960a8d03705d1ace08b1a19da3fdcc99ddbd,2,1,00000000839a8e6886ab5951d76f411475428afc90947ee320161bbf18eb6048
0000000082b5015589a3fdf2d4baff403e6f0be035a5d9742c1cae6295464449,3,1,000000006a625f06636b8bb6ac7b960a8d03705d1ace08b1a19da3fdcc99ddbd
000000004ebadb55ee9096c9a2f8880e09da59c0d68b1c228da88e48844a1485,4,1,0000000082b5015589a3fdf2d4baff403e6f0be035a5d9742c1cae6295464449
000000009b7262315dbf071787ad3656097b892abffd1f95a1a022f896f533fc,5,1,000000004ebadb55ee9096c9a2f8880e09da59c0d68b1c228da88e48844a1485
000000003031a0e73735690c5a1ff2a4be82553b2a12b776fbd3a215dc8f778d,6,1,000000009b7262315dbf071787ad3656097b892abffd1f95a1a022f896f533fc
0000000071966c2b1d065fd446b1e485b2c9d9594acd2007ccbd5441cfc89444,7,1,000000003031a0e73735690c5a1ff2a4be82553b2a12b776fbd3a215dc8f778d
00000000408c48f847aa786c2268fc3e6ec2af68e8468a34a28c61b7f1de0dc6,8,1,0000000071966c2b1d065fd446b1e485b2c9d9594acd2007ccbd5441cfc89444

In Neo4j 4.3.3 this would be imported without any issues. However, in 4.4.2 this would result in the :ID for some nodes being repeated (instead of the correct ones being set) for some reason.

If I do not specify these nodes as an :ID they will get imported correctly. But for some reason Neo4j seems to not like these values as :IDs.

Is this a bug in Neo4j?

EDIT: I've submitted this as an issue on GIthub just in case: Neo4j-admin import not importing :ID fields correctly. · Issue #12808 · neo4j/neo4j · GitHub

inersha
Node Clone

It appears that this is a known issue: neo4j-admin import generated duplicate id's after upgrade to 4.4 · Issue #12793 · neo4j/neo4j · GitH...

"I found the issue and a fix for it. It should be included in the next 4.4.x release."

Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit

On November 16 and 17 for 24 hours across all timezones, you’ll learn about best practices for beginners and experts alike.