Hi all,
I am currently ingesting millions of lines of data into my graph and looking for the most stable and performative way to do so. I already came up with batching my data and feeding it via UNWIND, so that is pretty much solved performance wise. Now i had a testing environment where i simply ran
session.run(self.statement,batch = batch)
and it processed roughly 30-40 batches per second. Then i went on to make it more stable and put it into a write_transaction as follows:
def commit_batch(tx, batch):
return tx.run(self.statement, batch=batch)
session.write_transaction(commit_batch, batch)
and the performance goes down to ~10 batch per second. Since i am processing 160k batches, this is an issue. I would be very happy if someone could point me to further optimizations or an explanation of why the performance drops that drastically in the write_transaction environment.
Thanks!
PS: I am only merging and creating nodes and relationships, no RETURN statement included. Still i have to collect some returned values in an object and occasionally call results.consume() to save my DB from crashing due to a full outgoing buffer. Any way to circumvent this so that it really does not return anything into the buffer? It seems like quite a waste of recources. Thanks again!