CSV import issue

anshulchaintha7 · June 14, 2023, 2:38pm

Hello,
I am not able to import the data using CSV files.
It is taking too much time to execute the query.
There are 1 million nodes. I want to create relationship between them. I am executing the query on Neo4j browser but it is not completing within the time period.
What can I do to execute it quickly?

glilienfield · June 14, 2023, 4:52pm

Can you include your query and sample of the data files?

Are you using a batch approach, such as apoc.periodic.commit/iterateor cypher call in transactions?

Do you have indexed on the properties used in your match/merge statements.

anshulchaintha7 · June 14, 2023, 5:15pm

This is the query:
LOAD CSV WITH HEADERS FROM "file:///file.csv" as row MATCH (from:A {Name: row.AlD}), (to:B {Name: row.BID})
Create (from)-[:CONNECT {UniquelD: row.UniquelD})->(to)

Label A and label B is having 3 properties.
I am not using apoc.

dana_canzano · June 15, 2023, 10:20am

@anshulchaintha7

What version of Neo4j?
Do you have indexes defined for :A(Name) and :B(Name)?
Have you tried using periodic commit on the load csv similar to what is described at LOAD CSV - Cypher Manual ( note this is Neo4j v5 syntax and so if you are using v4.x or v3.x this is not applicable)

anshulchaintha7 · June 15, 2023, 11:19am

I have defined no indexes yet.
A.csv has 3 properties B.csv has 3properties.

Community edition 4.33

dana_canzano · June 15, 2023, 1:07pm

@anshulchaintha7

Neo4j and like most all RDBMS will benefit from indexes especially for queries which are searching of which the MATCH statements you have defined are such.

Please create indexes for labels :A and :B and on property Name. Indexes for search performance - Cypher Manual

also see LOAD CSV - Cypher Manual so as to have LOAD CSV commit at every N rows

anshulchaintha7 · June 15, 2023, 6:26pm

@dana_canzano
Please suggest me the quicker query.
I have already mentioned the LOAD CSV query in the thread.
Take this query as reference and please suggest me.

dana_canzano · June 15, 2023, 9:50pm

@anshulchaintha7
Per my last update indexes are described at Indexes for search performance - Cypher Manual

and specifically you should create an index as

CREATE INDEX AName FOR (n:A) ON (n.Name);
CREATE INDEX BName FOR (n:B) ON (n.Name);

this is similar to what is described in the doc at Indexes for search performance - Cypher Manual

And as to why an index is suggested and again this is true for Neo4j as well as any RDBMS. Without an index we will scan all nodes with a label named :A and see which node has a property Name = row.AID. If you have 100k :A nodes then for each row in the CSV we will examine 100k nodes. If your csv has 100 rows of data then its 100 x 100k searches. Now if we have in index for each row in the CSV we use the index to find the node which has Name=row.AID which will be much faster.

As to changing LOAD CSV this too is described in the doc and per my prior update and as such LOAD CSV - Cypher Manual and so try

LOAD CSV WITH HEADERS FROM "file:///file.csv" as row 
call {
            with row
            MATCH (from:A {Name: row.AlD}), (to:B {Name: row.BID})
            Create (from)-[:CONNECT {UniquelD: row.UniquelD})->(to)
        }  IN TRANSACTIONS OF 100 ROWS
;

anshulchaintha7 · June 17, 2023, 12:06pm

It's taking too much time.
Is there a way I can execute the query in background and wait for its completion

glilienfield · June 17, 2023, 1:13pm

@dana_canzano has provided two important changes to improve your performance, indexing and batching. Your query is very basic. Those two changes should really help. Did you try them without success?

You can use the cypher-shell CLI tool to execute the query.

Is this a new database that you loading with this csv file? If so, you can try the neo4j admin import

anshulchaintha7 · June 17, 2023, 3:03pm

I am getting an error mentioned below:
Invalid input 'IN': expected
"CALL"
"CREATE"
"DELETE"
"DETACH"
"FOREACH"
"LOAD"
"MATCH"
"MERGE"
"OPTIONAL"
"REMOVE"
"RETURN"
"SET"
"UNION"
"UNWIND"
"USE"
"WITH"
(line 7, column 12 (offset: 260))
" } IN TRANSACTIONS OF 100 ROWS"

glilienfield · June 17, 2023, 3:47pm

Can you paste the entire query?

anshulchaintha7 · June 17, 2023, 4:00pm

LOAD CSV WITH HEADERS FROM "file:///SManageV.csv" as row
call {
with row
MATCH (from:S {Name: row.SlD}), (to:V {Name: row.VID})
Create (from)-[:Managed_By {UniquelD: row.UniquelD})->(to)
} IN TRANSACTIONS OF 100 ROWS
;

glilienfield · June 17, 2023, 4:11pm

You terminated your relationship with a ')' instead of a ']'.

LOAD CSV WITH HEADERS FROM "file:///SManageV.csv" as row
call {
with row
MATCH (from:S {Name: row.SlD}), (to:V {Name: row.VID})
Create (from)-[:Managed_By {UniquelD: row.UniquelD}]->(to)
} IN TRANSACTIONS OF 100 ROWS

dmcanzano · June 18, 2023, 12:03am

@anshulchaintha7

also my response earlier suggested to create an index on :A and :B labeled nodes. however your last update includes a query which uses nodes with label :S and :V

do you have indexes on these labels as well ??

anshulchaintha7 · June 18, 2023, 1:57am

Yes @dmcanzano @glilienfield
I have created indexes for both the labels and used the query
:auto LOAD CSV WITH HEADERS FROM "file:///ServerManageVM.csv" as row

CALL {

WITH row

MATCH (From: Server {Name: row.ServerID}), (to: VM {Name: row.VMID}) Create (from)-[:MANAGED BY {Unique ID: row. Unique ID}]->(to) } IN TRANSACTIONS OF 100 ROWS

glilienfield · June 18, 2023, 2:36am

Are you still having issues?

anshulchaintha7 · June 18, 2023, 2:49am

Yes
Same performance
And let me try to use Neo4j admin tool
I want to run query in background without internet connection
Will it work?

dana_canzano · June 18, 2023, 1:07pm

@anshulchaintha7

Same performance

details? does it take 10s, 120s , 5 minutes?
how many rows in the csv?

your query of

:auto LOAD CSV WITH HEADERS FROM "file:///ServerManageVM.csv" as row

CALL {

WITH row

MATCH (From: Server {Name: row.ServerID}), (to: VM {Name: row.VMID}) Create (from)-[:MANAGED BY {Unique ID: row. Unique ID}]->(to) } IN TRANSACTIONS OF 100 ROWS

will not run as-is as MANAGED BY as well as Unique ID contain spaces and property names can not contain spaces.

Also this is once again changing. Initially it was :A and :B labels, then :S and :V labels and now :Server and :VM?

Can you run and return the results of

:auto profile LOAD CSV WITH HEADERS FROM 'http://www.neo4j.com/ServerManageVM.csv' as row

CALL {

WITH row

MATCH (From: Server {Name: row.ServerID}), (to: VM {Name: row.VMID}) Create (from)-[:MANAGED_BY {Unique_ID: row. Unique_ID}]->(to) } IN TRANSACTIONS OF 100 ROWS

which will provide a query plan as to how the query is processed. Are the indexes being utilized?

anshulchaintha7 · June 19, 2023, 2:42am

30 minutes
16666 rows
Getting same error: Invalid input 'IN': expected
yes
FYI I am using community edition 4.3.3

Topic		Replies	Views
Importing relationships from multiple csv file Import / Export performance , load-csv	12	3177	June 5, 2020
Load large CSV with LOAD CSV or python Neo4j Graph Platform migrated	2	1054	August 4, 2023
Query of load csv not completing even after 12 hours Cypher cypher	9	1352	June 19, 2019
LOAD CSV taking time Import / Export cypher , import	6	698	September 25, 2021
How to speed up uploading data from csv in graph db Neo4j Graph Platform apoc , bolt , import , migrated , cypher-tagged	1	330	November 16, 2022

CSV import issue

Related topics