Correct way to ingest millions of results?

Rogie · August 27, 2020, 5:32pm

I'm having trouble debugging what's going on in my workflow:

from neo4j import GraphDatabase
def get_results(db):
    q = " ... my query ..."
    driver = GraphDatabase.driver(uri, auth=("neo4j", "pass"))
    db = driver.session(
    with db.begin_transaction() as tx:
        r = tx.run(q)
        tx.success = True
        for r in res:
            process_res(r)

The for loop seems to randomly hang after processing a a few hundred thousand results. My process_res() function is simple enough that I don't think it's causing any problems.

Is this the correct way to ingest millions of results, or is there a better way?

stefan.armbruster · August 27, 2020, 8:51pm

You should take care regarding transactions sizes. Typically 10 - 100k atomic operations (like creating a node, setting a property) are a good tx size. If you're way above that you might exhaust transaction state memory.

Either use client side transaction batching or take a look at apoc.periodic.iterate doing this on the neo4j server itself.

Cobra · August 28, 2020, 9:03am

Hello @Rogie

I wrote a little example to load data in your database, you can adapt it a bit to use it

Regards,
Cobra

Topic		Replies	Views
Python - What is the best way to avoid transaction errors when loading large data sets? Drivers & Stacks	1	184	January 15, 2024
Dependence of execution speed on the batch size of passed parameters Neo4j Graph Platform migrated	0	81	August 22, 2022
Appropriate configurations mixture to accelerate import of big data in Neo4j database Drivers & Stacks migrated	1	146	June 16, 2022
Operating in batches on a massive list Cypher apoc	3	4612	May 27, 2020
Creating a lot of nodes at once Python	2	798	May 7, 2020

July Summer Fun!

Correct way to ingest millions of results?

Related topics