Workflow for threads and writing to Neo4j?

Rogie · June 20, 2020, 3:54pm

I have code that reads data from the web, cleans it up, and stores it in Neo4j. I'm wondering how to "parallelize" this process, since getting the data from web can be slow sometimes. My current setup is something like this:

In config.py :

from neo4j import GraphDatabase


class cfg_holder():
    ''' Container for global variables.'''
    def __init__(self, params):
        self.params = params
        self.uri = "bolt://localhost:7687"
        self.driver = GraphDatabase.driver(self.uri, auth=("user", "pass"))
        self.db = self.driver.session()

def init(param):
    return cfg_holder(param)

In main.py :

import concurrent.futures
import config

def func(h):
    # get data
    # build queries
    # when enough data has been collected:
    with h.db.begin_transaction() as tx:
        tx.run(q)
        tx.success = True


if __name__ == "__main__":
    holders = []
    for i in [10, 20]:
        holders.append(config.init(i))
    with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
        executor.map(func, holders)

So each cfg_holder has access to its own db connection. I'm not sure this is the correct way to set things up.

It's possible that my design pattern is entirely off here. What's the right way to set this kind of thing up? Do I need to be locking the threads somewhere? Are threads even the right way to go about this? Looking for some general advice here...

jggomez · June 20, 2020, 8:45pm

Hi, I tried too but I tried with multiprocessing with python and I had problem and doesn't work. The reason because the driver doesn't support that, I don't know if the new driver for neo4j 4.0 supports that.

Rogie · June 21, 2020, 12:50am

Hello, can you please tell me what it is exactly that the driver doesn't support?

jggomez · June 23, 2020, 2:37am

Multiprocessing and threads in python

Topic		Replies	Views
Question about python neo4j-driver processing muti-threads (concurrent) Python performance	0	1099	October 4, 2019
Neo4j and FastApi concurrency Python	3	1804	November 6, 2023
Concurrency in random walk Neo4j Graph Platform migrated	3	145	August 4, 2022
Loading data to Neo4j with Python concurrently - async function is blocked on session.run Python	1	222	May 20, 2022
Neo4j python driver with asyncio.gather() crashes Drivers & Stacks migrated , python-tagged	4	550	December 20, 2022

July Summer Fun!

Workflow for threads and writing to Neo4j?

Related topics