cancel
Showing results for 
Search instead for 
Did you mean: 

Join the community at Nodes 2022, our free virtual event on November 16 - 17.

Importing UTF-8 characters problem

folterj
Node Clone

After exporting UTF-8 characters to cypher using APOC (which works correctly), these are not imported correctly in neo4j.

A simple reproducible example - file 'utf8_test.cypher' (confirmed format as 'UTF-8' encoding):

:begin
CREATE CONSTRAINT ON (node:`UNIQUE IMPORT LABEL`) ASSERT (node.`UNIQUE IMPORT ID`) IS UNIQUE;
:commit
:begin
UNWIND [{_id:0, properties:{x:"μ"}}] AS row
CREATE (n:`UNIQUE IMPORT LABEL`{`UNIQUE IMPORT ID`: row._id}) SET n += row.properties SET n:Test;
:commit
:begin
MATCH (n:`UNIQUE IMPORT LABEL`)  WITH n LIMIT 20000 REMOVE n:`UNIQUE IMPORT LABEL` REMOVE n.`UNIQUE IMPORT ID`;
:commit
:begin
DROP CONSTRAINT ON (node:`UNIQUE IMPORT LABEL`) ASSERT (node.`UNIQUE IMPORT ID`) IS UNIQUE;
:commit

command line:

bin\cypher-shell -u neo4j -p [password] < import\utf8_test.cypher

query:

MATCH (x:Test) RETURN x.x

returns:

"μ"

Any feedback appreciated.

neo4j 4.1.1
neo4j Desktop 1.3.4

11 REPLIES 11

Cobra
Ninja
Ninja

Hello @folterj

Can you try this on your database?

CREATE (n:Test {title: "μ"});
MATCH (n:Test) RETURN n.title

On my database, I get the right result.

Regards,
Cobra

Hi @Cobra,

Yes, running this from the browser works fine. We have a database with UTF-8 characters in it, which we can export correctly as well. However, the importing the cypher does not work.

As I understand this is the best way to import - our APOC exported cypher has millions of nodes/relationships, optimally batched and uses param unwind (and even importing that takes a surprisingly long time).

koji
Ninja
Ninja

Hi @folterj

I created utf8_test.cypher based on the text from begin to commit.
Then I used the same cypher-shell command you did.

It works correctly.


Are you on Windows?

My operating environment:
macOS Catalina 10.15.6
Neo4j 4.1.1
Neo4j Desktop 1.3.4 (1.3.4.27)

Cobra
Ninja
Ninja

Could we see it?

Unique constraint and UNWIND will make the load faster
What is the power of the database and the computer?
Can we wee how you load nodes and relations?

Regards,
Cobra

folterj
Node Clone

Hi, thanks for the quick responses.

@koji, I'm using Windows 10 64-bit. If you copied the text from the above then it should copy the 2-byte UTF8 character correctly from this page (I just checked) to reproduce the problem. So it seems it might be an issue specifically on Windows.

@Cobra, my initial post shows exactly how to reproduce the issue including its result (on Windows 10 64-bit). Regarding optimisation, from experience it seems CREATE is faster than MERGE. We create separate files for nodes, and for relationships (the latter indeed benefits from indexing), using unwind_batch_params, and automatic batching. APOC already uses our internal unique uuids, so no generated _id fields are added to the exported cyphers. The batching makes a huge difference, and also avoids neo4j becoming overwhelmed and crashing (well, it works before version 4.1 anyway). But it's still slow to my expectation. We use cypher-shell for loading. But this is a different topic to the issue of this post.

Cobra
Ninja
Ninja

Hello @folterj

I'm a bit confused, I just executed your queries on a database and it's working correctly,
the correct value is returned (I'm on Windows)

Regards,
Cobra

Hi @Cobra,

Thanks, that's interesting, did you use cypher-shell from the command line for the import as well?
Also, I assume you have the same version of neo4j & Windows 10 64-bit?
If so maybe it's something specific to my local set-up as I seem to keep hitting problems. neo4j uses it's own java distribution that comes with the installation so that's all fine I assume. I can try to completely remove neo4j including any settings in other locations etc., and reinstall once more.

Joost.

Cobra
Ninja
Ninja

No I did not, so I presume it's maybe coming from this if there is a problem. I just created a local database on Neo4j Desktop and executed your requests. I hope it helps

Hi @Cobra,

The import is the main issue, our export cypher is about 1.3GB which is not feasible let alone optimal to paste and execute as a query into the browser.
We're looking into this avenue as we're currently not able to import a dump file from 4.0.4 to 4.1.x without neo4j crashing persistently. We hope to avoid this issue using cypher import, though apart from the UTF8 and automatic float to int conversion in APOC, we have not been able to find a working solution this way either unfortunately. But that's again another issue (Upgrade fails from Enterprise 4.0.4 to 4.1.0)

Cobra
Ninja
Ninja

What is it in your export cypher?

Looks like you got a lot of problems

sebastien
Node

Hello,
I definitely confirm cypher-shell for Windows does not read utf8 encoded file properly. Please see this this reported github issue.

So far the only workaround I found is to use notepad++ and use the "Encoding > Convert to ANSI" on my cypher input file. Then cypher-shell correctly process my file and I can see the special characters in my neo4j browser.