Slow streaming with neo4j driver compared to py2neo

I run on a virtual graph I created, contains 1 million nodes, with both neo4j and py2neo drivers.

The returned data and the number of fetched nodes was identical with both drivers, except the time took for neo4j driver to fetch all the nodes was significantly slower than with py2neo, almost x3 times slower. The same difference occurred also with

After creating the virtual graph, I used the following snippet to measure the durations (python 3.9):

# py2neo driver
graph = py2neo.Graph(f"bolt://{db_host}:{db_port}", auth=(user, password))
start = time.time()"CALL'{graph_name}') YIELD nodeId").data()"py2neo driver: nodes fetched after: %s seconds", time.time() - start)

# neo4j driver
graph = neo4j.GraphDatabase.driver(f"bolt://{db_host}:{db_port}", auth=(user, password))
session = graph.session()
start = time.time()"CALL'{graph_name}') YIELD nodeId").data()"neo4j driver: nodes fetched after: %s seconds", time.time() - start)

The output is:

INFO:root:py2neo driver: nodes fetched after: 14.329415798187256 seconds
INFO:root:neo4j driver: nodes fetched after: 40.44703483581543 seconds

I tried to increase the fetch_size of the neo4j.Session object but it barely changed the result.

The drivers versions:
neo4j==4.3.3 and also tested with 4.3.2

I used a local neo4j docker image: neo4j:4.3.2-community
The GDS library version is: 1.6.2

I created the graph with the attached data, duplicated x1000 in order to create 1 million nodes:
graph_dataset.txt (31.0 KB)

I will appreciate your help in understanding why is the neo4j driver so slow compared to py2neo, and if there is any way to improve its performance.

Thanks a lot

1 Like