We want to create more than 20 million nodes into neo4j. But while pushing the nodes data in Neo4j we have observed that the performance of the database was quite slow, so to tackle this we have tried following things:
We have used Py2neo client library and toolkit for working with Neo4j
- We have used graph.create() api to create nodes without using any transaction control API’s
- We have created a single query to create all nodes. We have used this single query in graph.Run() but we have observed that it is also taking a lot of time.
- Then we used a transaction api like begin which returns a transaction object and then we are using graph.create() to insert the nodes in the database and then use commit API on the transaction object. We were able to commit when we were having less number of nodes in file but for large numbers of nodes we were not able to commit manually. To fix this we have used the auto-commit API’s provided by py2neo and closed the transaction manually at the end of code but still performance was not as good as expected.
What will be the best practice to insert millions of node at a time in Neo4j graph database ?
Is importing the node data using the CSV file affects the speed of the Neo4j graph database ?
For importing the CSV files in Neo4j we have to keep them in local directory of Neo4j, can we load the CSV files stored in our local drive ?
Thank you in advance.