What is the best Neo4j deployment to create Graph Database in Neo4j with huge volumes of data (~250GB)?

kostas_vavouris · September 25, 2021, 1:35pm

Hi,
I want to write python code in order to manage huge volumes of bibliographic data. Specifically i want to write & execute efficiently python code in order to check dataset quality, access huge volumes of bibliographic data in streaming json format and produce csv files from them. Then from the csv files i want to create quickly a Neo4j database that will be accessible through internet. What is more, i want to use for my execution runtimes GPU in order to speed up execution. What infrastructure can implement this kind of work load efficiently and efficiency? This kind of workload must be executed in a fully automated way.

Thanks in advance for your time!

michael.hunger · September 28, 2021, 8:46pm

That's a lot of questions:

GPU - there is currently no special treatment of GPUs
CSV you can use neo4j-admin import to quickly create databases in bulk from huge CSVs it also supports GZ compression

Python and data import.
The Python driver is not the fastest driver for high volumes, but it should be good enough if you have some patience.
Some people had more success from python using the http API in terms of throughput

they even created a separate library: Announcing neo4j-connector 1.0.0 (python 3.5+)

Otherwise for queries on data quality etc. I suggest the cypher query tuning course to make sure you have always fast queries:

kostas_vavouris · September 28, 2021, 9:13pm

Thank you very much. Your tips are very valuable!

Topic		Replies	Views
What is the most efficient and fast way to load very large volumes of data into a Neo4j graph database? Import / Export apoc , cypher , import	2	703	August 19, 2021
Fastest way to load data in neo4j using python Cypher	5	9587	May 5, 2021
How to handle large data insertion in Neo4j Operations performance , cypher	1	2209	July 26, 2021
Load large CSV with LOAD CSV or python Neo4j Graph Platform migrated	2	1054	August 4, 2023
Load large volume of data in neo4j Import / Export	3	3318	February 11, 2020

What is the best Neo4j deployment to create Graph Database in Neo4j with huge volumes of data (~250GB)?

Related topics