cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! Site migration is underway. Phase 2: migrate recent content

Correct way to ingest millions of results?

Rogie
Node Link

I'm having trouble debugging what's going on in my workflow:

from neo4j import GraphDatabase
def get_results(db):
    q = " ... my query ..."
    driver = GraphDatabase.driver(uri, auth=("neo4j", "pass"))
    db = driver.session(
    with db.begin_transaction() as tx:
        r = tx.run(q)
        tx.success = True
        for r in res:
            process_res(r)

The for loop seems to randomly hang after processing a a few hundred thousand results. My process_res() function is simple enough that I don't think it's causing any problems.

Is this the correct way to ingest millions of results, or is there a better way?

2 REPLIES 2

You should take care regarding transactions sizes. Typically 10 - 100k atomic operations (like creating a node, setting a property) are a good tx size. If you're way above that you might exhaust transaction state memory.

Either use client side transaction batching or take a look at apoc.periodic.iterate doing this on the neo4j server itself.

Cobra
Ninja
Ninja

Hello @Rogie

I wrote a little example to load data in your database, you can adapt it a bit to use it

Regards,
Cobra