cancel
Showing results for 
Search instead for 
Did you mean: 

Performance issue while doing a LOAD CSV

igorbmartins
Node Clone

Hi everyone, how are you doing?

Guys, I would like a little help. Actually I am doing a study to my company to verify the possibility to use Neo4j as our first NoSql data base.

Firstly I created a small use case with 10 nodes and 12 relationships with few registers. After see that this model could answer our questions I decided to go to the second step, the performance check.

To cheche the performance I am thinking in doing it in some fases:

  • MERGE – Insert or update nodes using load CSV;
  • MERGE – Create relationship using load CSV;
  • MATCH – Select nodes that answer our questions from our use cases;

After this first impression, we would like to use Pentaho to do the same use processes to check the performance.

So, what is my problem? During my fist MERGE test, I am observing a great slowness even with a volumetry that I consider low. Below I will put the time and after the command sample.

Rows: 1.000 Method: Merge with empty model, so only INSERT Time: less than 1 second

Rows: 1.000 Method: Merge with full model, so only UPDATE Time: less than 1 second

Rows: 10.000 Method: Merge with empty model, so only INSERT Time: 00:00:21

Rows: 10.000 Method: Merge with full model, so only UPDATE Time: 00:01:00

Rows: 50.000 Method: Merge with empty model, so only INSERT Time: 00:16:48

Rows: 50.000 Method: Merge with full model, so only UPDATE Time: 00:29:11

Rows: 100.000 Method: Merge with empty model, so only INSERT Time: 01:00:44

Rows: 100.000 Method: Merge with full model, so only UPDATE Time: 02:19:00

 

Command Sample

LOAD CSV WITH HEADERS FROM 'file:///File.csv' AS row

MERGE (p:Node {4 Keys})

ON CREATE SET 22 attributes

ON MATCH SET 22 attributes;

 

Environment

Neo4j Desktop (CPU: i5 2.11 GHz RAM: 16GB HD: SSD) – I am planning in install a server one after…;

 

As you can see the performance is not linear. When I insert 1.000 rows it last less then 1 second. After I try to insert 10 times more “10.000”, so I was expecting something near 10 seconds, but it last 1 minute. After I try to insert 50.000 rows and again it last more than I was expecting for. It last 16 minutes.

After I did this test but with relationships and the performance was worse.

This performance for me is not acceptable, because I will insert more than 20.000.000 (twenty millions nodes).

Could you guys help me with tips. I am sure that I am doing something wrong because I know that many large companies around the world are using Neo4j.

Regards, Igor Martins

5 REPLIES 5

glilienfield
Ninja
Ninja

Have you created an index on the properties your are merging on, as it performs a match first. 

igorbmartins
Node Clone

Thanks glilienfield. 
In my first test I have didn't created any index because I was thinking that it will not make any difference because it is an insert. But I forgot that it is a merg so, it first check if it exist so the index make difference. 
After your messagem I created the index and it works better. Now I will try to load more data from 1.000.000  to 20.000.000

Regards, Igor Martins

igorbmartins
Node Clone

HI everyone, it is me again.
After create the index the merge in a empty database works fine. When I run the command to insert 1.000.000 rows the process last 34 seconds. After that I executed the command again to check the performande in update mode, but unfortunately the process last more than 10 hours so I decided to stop it.

Do you have some tip?

Regards, Igor Bastos Martins

You state your merge is on four keys. Are they all needed to uniquely identify each node?  Did you create a composite index on them or separate indexes?  

Can you try running your import with EXPLAIN and share the results. Maybe that will help identify the bottleneck and indicate if your indexes are being used. 

igorbmartins
Node Clone

Hi, after look for some perfoemance recomendation I found some noutes about 3 parameters:

dbms.memory.heap.initial_size
dbms.memory.heap.max_size
dbms.memory.pagecache.size

After change its values, the process works fine in INSERT and in UPDATE mode.

Thank you for you time and for help me.

Regards, Igor Martins