Can we find any benchmarking figures for neo4j spark connector (DataFrame to DB)

pradeepramamurthi · April 22, 2020, 3:27pm

I am wondering if anybody can point me to any benchmarking figures of Spark DataFrame writes to neo4j database using neo4j-spark-connector

I am currently using the following versions on a 60 core/ 60 executor cluster.

I am using neo4j version = 3.5
neo4j-java-driver-1.7.2.jar
Spark 2.4.0

Using Neo4jDataFrame.mergeEdgeList(), I have tried using batch sizes (10k, 20k and 40k)

However, it seems to take unreasonable amount of time.

100k record takes about 35 minutes. For a million records , it seemed to be hanging for more than 14hrs. The seems to be no progress in Spark UI and all tasks show 0/100

What is the expected write rates to neo4j database using Spark connector and what is the best way to optimise larger dataframes (containing millions of records) to ensure faster loads.

Thanks
Shiva

david_allen · November 12, 2020, 12:31pm

Neo4j has a new approach to the spark connector which can be found here, and includes architectural guidance for getting best performance

It's hard to say exactly what performance each user will gets because it depends heavily on your data model and setup. But we have seen tens of thousands of node writes per second on moderate hardware, for nodes consisting of say 10 or so properties, when written using the "normalized loading" approach that's documented on that page.

Topic		Replies	Views
Spark UDF calling neo4j Drivers & Stacks spark	2	485	April 20, 2020
Hello everyone - Fábio from Florianópolis, Brazil Introduce-Yourself	2	308	March 22, 2021
Neo4j community edition - Can it integrate with Apache Spark Operations	11	766	November 16, 2020
How to write neo4j in python with neo4j-spark-connector Neo4j Graph Platform	12	2149	November 12, 2020
Spark Neo4j Connector Version 2.4.5-M1 Neo4j Graph Platform	0	292	November 13, 2020

Can we find any benchmarking figures for neo4j spark connector (DataFrame to DB)

Related topics