I want to discuss regarding ETL, I have data from postgre to neo4j but taking lot of time, I have used databricks spark dataframe please suggest me which approach is more suitable?

ronakpanchal · January 6, 2026, 7:21am

hakan.lofqvist1 · January 7, 2026, 10:38am

Is this a (A) “one time etl” job or rather a (B) “continuous synchronisation”?

For both, do as much of the transformation/de duplication in databricks (minimize redundant work on the database side). You are still writing with the overhead of “transactions” to a database, so find the right batch size and don’t expect any magic numbers (if you create 50-100k nodes per second - it is resonable, if you are way below that - you are probably missing a node key constraint/index or some other issue).

For (B) - depending on other workload, you may want to optimize for stability instead of speed/throughput (= reduce the batch size and reduce concurrency).

Topic		Replies	Views
Analysis of big data and use of ETL ETL-Tool	11	1201	August 26, 2019
ETL\Data Pipeline Neo4j Graph Platform migrated	0	218	November 10, 2022
Can we find any benchmarking figures for neo4j spark connector (DataFrame to DB) Neo4j Graph Platform	1	476	November 12, 2020
ETL Tool vs APOC vs some other tools? ETL-Tool	5	983	June 7, 2020
Transferring BigQuery Data to Neo4J Neo4j Graph Platform migrated	1	297	July 24, 2022

I want to discuss regarding ETL, I have data from postgre to neo4j but taking lot of time, I have used databricks spark dataframe please suggest me which approach is more suitable?

Related topics