i tried to ingest my edges and nodes from delta file to neo4j database using spark connector, but it gets slower and slower the first 4 mill edges took 1hr and it gets slower
i ingested 130 million nodes in 6 hrs, i see that other people ingest there billions on nodes and edges like in 1-2 hrs, what did i do wrong here
There are a ton of reasons that can contribute to slow down the process:
The first thing to check is the query.log in order to understand which queries are slow
@santand84 called it out. We have a graph that is ~32M nodes/1.7B edges that we load from Apache spark. We've had to work our way through quite a number of performance issues on the loading side, mostly tuning the batch size and partitioning/executor count.
The bigger issue we run into with large loads where there is significant overlap in relationship/node coverage is locking issues on nodes from parallel/concurrent transactions.
@brianmartin the best practice on batch importing the data with Spark is: