Why is write_transaction() so much slower than a simple run()?

florin.ratajczak1 · December 5, 2019, 9:50am

Hi all,

I am currently ingesting millions of lines of data into my graph and looking for the most stable and performative way to do so. I already came up with batching my data and feeding it via UNWIND, so that is pretty much solved performance wise. Now i had a testing environment where i simply ran

session.run(self.statement,batch = batch)

and it processed roughly 30-40 batches per second. Then i went on to make it more stable and put it into a write_transaction as follows:

def commit_batch(tx, batch):
    return tx.run(self.statement, batch=batch)

session.write_transaction(commit_batch, batch)

and the performance goes down to ~10 batch per second. Since i am processing 160k batches, this is an issue. I would be very happy if someone could point me to further optimizations or an explanation of why the performance drops that drastically in the write_transaction environment.

Thanks!

PS: I am only merging and creating nodes and relationships, no RETURN statement included. Still i have to collect some returned values in an object and occasionally call results.consume() to save my DB from crashing due to a full outgoing buffer. Any way to circumvent this so that it really does not return anything into the buffer? It seems like quite a waste of recources. Thanks again!

ganesanmithun323 · December 5, 2019, 12:52pm

HI @florin.ratajczak1 , do you index created on the node properties you are merging ?
if not , that would improve the performance drastically

florin.ratajczak1 · December 5, 2019, 1:55pm

Hi! Thank you for your interest in the question. Yes, i have set an index on the property i am merging on. I had the batches per second wrong by a factor of 10, but we are still talking hours for the complete import.

Topic		Replies	Views
Python neo4j driver write transaction Python	0	596	July 27, 2020
Improving data writing efficiency in python Cypher cypher	7	2106	April 12, 2020
Seemingly simple batch query with 2500 rows takes over a minute to insert Cypher performance , cypher , batching	9	366	April 14, 2022
(Major) downsides/caveats of using explicit transactions vs implicit transactions Newbie Questions performance	0	342	April 27, 2023
Python - What is the best way to avoid transaction errors when loading large data sets? Drivers & Stacks	1	183	January 15, 2024

Get Certified in June!

Why is write_transaction() so much slower than a simple run()?

Related topics