Hi everyone, how are you doing?
Guys, I would like a little help. Actually I am doing a study to my company to verify the possibility to use Neo4j as our first NoSql data base.
Firstly I created a small use case with 10 nodes and 12 relationships with few registers. After see that this model could answer our questions I decided to go to the second step, the performance check.
To cheche the performance I am thinking in doing it in some fases:
- MERGE – Insert or update nodes using load CSV;
- MERGE – Create relationship using load CSV;
- MATCH – Select nodes that answer our questions from our use cases;
After this first impression, we would like to use Pentaho to do the same use processes to check the performance.
So, what is my problem? During my fist MERGE test, I am observing a great slowness even with a volumetry that I consider low. Below I will put the time and after the command sample.
Rows: 1.000 Method: Merge with empty model, so only INSERT Time: less than 1 second
Rows: 1.000 Method: Merge with full model, so only UPDATE Time: less than 1 second
Rows: 10.000 Method: Merge with empty model, so only INSERT Time: 00:00:21
Rows: 10.000 Method: Merge with full model, so only UPDATE Time: 00:01:00
Rows: 50.000 Method: Merge with empty model, so only INSERT Time: 00:16:48
Rows: 50.000 Method: Merge with full model, so only UPDATE Time: 00:29:11
Rows: 100.000 Method: Merge with empty model, so only INSERT Time: 01:00:44
Rows: 100.000 Method: Merge with full model, so only UPDATE Time: 02:19:00
Command Sample
LOAD CSV WITH HEADERS FROM 'file:///File.csv' AS row
MERGE (p:Node {4 Keys})
ON CREATE SET 22 attributes
ON MATCH SET 22 attributes;
Environment
Neo4j Desktop (CPU: i5 2.11 GHz RAM: 16GB HD: SSD) – I am planning in install a server one after…;
As you can see the performance is not linear. When I insert 1.000 rows it last less then 1 second. After I try to insert 10 times more “10.000”, so I was expecting something near 10 seconds, but it last 1 minute. After I try to insert 50.000 rows and again it last more than I was expecting for. It last 16 minutes.
After I did this test but with relationships and the performance was worse.
This performance for me is not acceptable, because I will insert more than 20.000.000 (twenty millions nodes).
Could you guys help me with tips. I am sure that I am doing something wrong because I know that many large companies around the world are using Neo4j.
Regards, Igor Martins