Hello,
I am not able to import the data using CSV files.
It is taking too much time to execute the query.
There are 1 million nodes. I want to create relationship between them. I am executing the query on Neo4j browser but it is not completing within the time period.
What can I do to execute it quickly?
Can you include your query and sample of the data files?
Are you using a batch approach, such as apoc.periodic.commit/iterateor cypher call in transactions?
Do you have indexed on the properties used in your match/merge statements.
This is the query:
LOAD CSV WITH HEADERS FROM "file:///file.csv" as row MATCH (from:A {Name: row.AlD}), (to:B {Name: row.BID})
Create (from)-[:CONNECT {UniquelD: row.UniquelD})->(to)
Label A and label B is having 3 properties.
I am not using apoc.
What version of Neo4j?
Do you have indexes defined for :A(Name) and :B(Name)?
Have you tried using periodic commit on the load csv similar to what is described at LOAD CSV - Cypher Manual ( note this is Neo4j v5 syntax and so if you are using v4.x or v3.x this is not applicable)
I have defined no indexes yet.
A.csv has 3 properties B.csv has 3properties.
Community edition 4.33
Neo4j and like most all RDBMS will benefit from indexes especially for queries which are searching of which the MATCH
statements you have defined are such.
Please create indexes for labels :A and :B and on property Name. Indexes for search performance - Cypher Manual
also see LOAD CSV - Cypher Manual so as to have LOAD CSV
commit at every N rows
@dana_canzano
Please suggest me the quicker query.
I have already mentioned the LOAD CSV query in the thread.
Take this query as reference and please suggest me.
@anshulchaintha7
Per my last update indexes are described at Indexes for search performance - Cypher Manual
and specifically you should create an index as
CREATE INDEX AName FOR (n:A) ON (n.Name);
CREATE INDEX BName FOR (n:B) ON (n.Name);
this is similar to what is described in the doc at Indexes for search performance - Cypher Manual
And as to why an index is suggested and again this is true for Neo4j as well as any RDBMS. Without an index we will scan all nodes with a label named :A and see which node has a property Name = row.AID. If you have 100k :A nodes then for each row in the CSV we will examine 100k nodes. If your csv has 100 rows of data then its 100 x 100k searches. Now if we have in index for each row in the CSV we use the index to find the node which has Name=row.AID which will be much faster.
As to changing LOAD CSV
this too is described in the doc and per my prior update and as such LOAD CSV - Cypher Manual and so try
LOAD CSV WITH HEADERS FROM "file:///file.csv" as row
call {
with row
MATCH (from:A {Name: row.AlD}), (to:B {Name: row.BID})
Create (from)-[:CONNECT {UniquelD: row.UniquelD})->(to)
} IN TRANSACTIONS OF 100 ROWS
;
It's taking too much time.
Is there a way I can execute the query in background and wait for its completion
@dana_canzano has provided two important changes to improve your performance, indexing and batching. Your query is very basic. Those two changes should really help. Did you try them without success?
You can use the cypher-shell CLI tool to execute the query.
Is this a new database that you loading with this csv file? If so, you can try the neo4j admin import
I am getting an error mentioned below:
Invalid input 'IN': expected
"CALL"
"CREATE"
"DELETE"
"DETACH"
"FOREACH"
"LOAD"
"MATCH"
"MERGE"
"OPTIONAL"
"REMOVE"
"RETURN"
"SET"
"UNION"
"UNWIND"
"USE"
"WITH"
(line 7, column 12 (offset: 260))
" } IN TRANSACTIONS OF 100 ROWS"
Can you paste the entire query?
LOAD CSV WITH HEADERS FROM "file:///SManageV.csv" as row
call {
with row
MATCH (from:S {Name: row.SlD}), (to:V {Name: row.VID})
Create (from)-[:Managed_By {UniquelD: row.UniquelD})->(to)
} IN TRANSACTIONS OF 100 ROWS
;
You terminated your relationship with a ')' instead of a ']'.
LOAD CSV WITH HEADERS FROM "file:///SManageV.csv" as row
call {
with row
MATCH (from:S {Name: row.SlD}), (to:V {Name: row.VID})
Create (from)-[:Managed_By {UniquelD: row.UniquelD}]->(to)
} IN TRANSACTIONS OF 100 ROWS
also my response earlier suggested to create an index on :A and :B labeled nodes. however your last update includes a query which uses nodes with label :S and :V
do you have indexes on these labels as well ??
Yes @dmcanzano @glilienfield
I have created indexes for both the labels and used the query
:auto LOAD CSV WITH HEADERS FROM "file:///ServerManageVM.csv" as row
CALL {
WITH row
MATCH (From: Server {Name: row.ServerID}), (to: VM {Name: row.VMID}) Create (from)-[:MANAGED BY {Unique ID: row. Unique ID}]->(to) } IN TRANSACTIONS OF 100 ROWS
Are you still having issues?
Yes
Same performance
And let me try to use Neo4j admin tool
I want to run query in background without internet connection
Will it work?
Same performance
details? does it take 10s, 120s , 5 minutes?
how many rows in the csv?
your query of
:auto LOAD CSV WITH HEADERS FROM "file:///ServerManageVM.csv" as row
CALL {
WITH row
MATCH (From: Server {Name: row.ServerID}), (to: VM {Name: row.VMID}) Create (from)-[:MANAGED BY {Unique ID: row. Unique ID}]->(to) } IN TRANSACTIONS OF 100 ROWS
will not run as-is as MANAGED BY
as well as Unique ID
contain spaces and property names can not contain spaces.
Also this is once again changing. Initially it was :A and :B labels, then :S and :V labels and now :Server and :VM?
Can you run and return the results of
:auto profile LOAD CSV WITH HEADERS FROM 'http://www.neo4j.com/ServerManageVM.csv' as row
CALL {
WITH row
MATCH (From: Server {Name: row.ServerID}), (to: VM {Name: row.VMID}) Create (from)-[:MANAGED_BY {Unique_ID: row. Unique_ID}]->(to) } IN TRANSACTIONS OF 100 ROWS
which will provide a query plan as to how the query is processed. Are the indexes being utilized?
30 minutes
16666 rows
Getting same error: Invalid input 'IN': expected
yes
FYI I am using community edition 4.3.3