I am wondering if anybody can point me to any benchmarking figures of Spark DataFrame writes to neo4j database using neo4j-spark-connector
I am currently using the following versions on a 60 core/ 60 executor cluster.
I am using neo4j version = 3.5
neo4j-java-driver-1.7.2.jar
Spark 2.4.0
Using Neo4jDataFrame.mergeEdgeList(), I have tried using batch sizes (10k, 20k and 40k)
However, it seems to take unreasonable amount of time.
100k record takes about 35 minutes. For a million records , it seemed to be hanging for more than 14hrs. The seems to be no progress in Spark UI and all tasks show 0/100
What is the expected write rates to neo4j database using Spark connector and what is the best way to optimise larger dataframes (containing millions of records) to ensure faster loads.
Thanks
Shiva