Neo4j-admin import can be run multiple times?

skmami · May 14, 2020, 9:35pm

Greetings,

Does import tool allow me to load different sections of the graph one at a time ? I want to load few nodes and relationships at a time to build the entire graph. Is that possible ?

Thanks
Satish

jggomez · May 15, 2020, 4:16am

I think that is not possible. Why do you need that?? What do you want to do?

Thanks

skmami · May 15, 2020, 4:26am

It is slowing down after loading few nodes. I have split my data files into thirty files with each 1million records. It slows down half way through whether I have 30 files, 20 file or even 1 file.

How to debug this. ?

Import starting 2020-05-14 23:15:09.255-0500
  Estimated number of nodes: 6.00 M
  Estimated number of node properties: 31.00 M
  Estimated number of relationships: 5.00 M
  Estimated number of relationship properties: 0.00 
  Estimated disk space usage: 794.5MiB
  Estimated required memory usage: 1.070GiB

(1/4) Node import 2020-05-14 23:15:09.297-0500
  Estimated number of nodes: 6.00 M
  Estimated disk space usage: 632.4MiB
  Estimated required memory usage: 1.070GiB
.......... .......... .......... .......... ..........   5% ∆1s 75ms
.......... .......... .......... .......... ..........  10% ∆204ms
.......... .......... .......... .......... ..........  15% ∆402ms
.......... .......... .......... .......... ..........  20% ∆201ms
.......... .......... .......... .......... ..........  25% ∆400ms
.......... .......... .......... .......... ..........  30% ∆1s 403ms
.......... .........- .......... .......... ..........  35% ∆200ms
.......... .......... .......... .......... ..........  40% ∆1ms
.......... .......... .......... .......... ..........  45% ∆1ms
.......... .......... .......... .......... ..........  50% ∆1s 201ms
.......... .......... .......... .......... ..........  55% ∆4s 804ms
.......... .......... .......... .......... ..........  60% ∆20s 416ms
.......... .......... .......... .......... .......

I have 100Gigs of RAM but it is hardly using any RAM and CPU is also not fully used.

Thanks

skmami · May 15, 2020, 4:29am

(1/4) Node import 2020-05-14 23:15:09.297-0500
  Estimated number of nodes: 6.00 M
  Estimated disk space usage: 632.4MiB
  Estimated required memory usage: 1.070GiB
.......... .......... .......... .......... ..........   5% ∆1s 75ms
.......... .......... .......... .......... ..........  10% ∆204ms
.......... .......... .......... .......... ..........  15% ∆402ms
.......... .......... .......... .......... ..........  20% ∆201ms
.......... .......... .......... .......... ..........  25% ∆400ms
.......... .......... .......... .......... ..........  30% ∆1s 403ms
.......... .........- .......... .......... ..........  35% ∆200ms
.......... .......... .......... .......... ..........  40% ∆1ms
.......... .......... .......... .......... ..........  45% ∆1ms
.......... .......... .......... .......... ..........  50% ∆1s 201ms
.......... .......... .......... .......... ..........  55% ∆4s 804ms
.......... .......... .......... .......... ..........  60% ∆20s 416ms
.......... .......... .......... .......... ..........  65% ∆10m 16s 392ms
.......... .......... .......... .......... ..........  70% ∆202ms
.......... .......... .......... .......... ..........  75% ∆0ms
.......... .......... .......... .......... ..........  80% ∆0ms
.......... .......... .......... .......... ..........  85% ∆1s 3ms
.......... .......... .......... .......... ..........  90% ∆835ms
.......... .......... .......... .......... ..........  95% ∆801ms
.......... .......... .......... .......... .......... 100% ∆200ms

At 65% it took about 10 minutes. Is there any particular reason why takes so long some times ?

jggomez · May 15, 2020, 4:55am

I was confused. You can use "LOAD CSV FROM" and importing large amounts of data

Is it useful for you?

Thanks

skmami · May 15, 2020, 1:57pm

Thanks @jggomez.

I am loading the data for the first time and I tried LOAD CSV FROM with periodic commit but still was not fast enough.

Since it is loading on an empty database I have the option of import tool which can only be used on a empty database and this seems to avoid transaction layer which makes it faster loading.

But what I am noticing with import is that it goes faster for up to 50 to 60 percent and suddenly slows down.

I tried loading 30 files first, then reduced to 20 files and then to 10 and then to 1. It always stops at 50 to 60 percent regardless of how many number of files I am trying to load.

I have million rows in each file. Is there a suggested number for records that can be in a file ?

Thanks

jggomez · May 16, 2020, 12:24am

Hi, I never have loaded million rows from CSV file. You use CREATE instead MERGE. Can I see your code??

Thanks

skmami · May 16, 2020, 11:42pm

Thanks @jggomez.

I was using one file with all the columns and I have too many duplicates in the files and I think it is the reason why it is hanging in the middle.

Have you used the import tool ?

Wondering if anyone can comment on why does it print dashes ( _ ) sometimes instead of all dots.
Is there any special meaning for those dashes ?
Thanks

muzos07 · June 14, 2020, 10:06pm

Cannot really help you with the main problem, however I also experienced these dashes on random places during importing. But I never recorded any missing data after the load (I was importing the same data set into multiple databases and running the same queries on all of them).

Topic		Replies	Views
Load large CSV with LOAD CSV or python Neo4j Graph Platform migrated	2	1177	August 4, 2023
Neo4j-admin import relations from the same .csv file Import / Export	6	894	December 31, 2020
Neo4j Import tools slow ingestion Import / Export import , neo4j-import , neo4j	1	548	April 8, 2022
Is the admin bulk import faster for zipped csv? Import / Export import , neo4j-admin	5	388	April 5, 2023
Importing relationships from multiple csv file Import / Export performance , load-csv	12	3264	June 5, 2020

Neo4j-admin import can be run multiple times?

Related topics