Assess query performance with python driver

jfisher12 · March 4, 2021, 5:16pm

I'm working my way through ~50 metrics for potential use as machine learning features. I want to compare different queries that access the information and compare resource allocation when the data is in different graph models and indexing.

I've created three versions of the data (Version 4.1.3) and stored it in Neo4j Desktop. I access the databases individually using the Neo4j Python driver and dump the results into a Jupyter Notebook.

The Cypher Workflow document indicates the queries return a stream of records, header and footer metadata, and a result summary that contains additional information relating to query execution and result content (which includes the information for EXPLAIN and PROFILE).

The Knowledge Base has an article on how to get to much of this information using the cypher shell, but I can't find the directions/functions that will allow me to access this through the Python Driver.

To help optimize my queries, graph models, and indexing, I want to compare query performance information such as heap size, physical memory, run time, caching, CPU use, thread count, and records returned.

This article on data science stack exchange references references something similar done in Neo4j in Action.

I'd like to pull this information programmatically in real time into the Jupyter Notebook when I run each specific query (performance information alone can be returned, or returned with the query results) and not have to monitor an application like the Halin Monitoring one.

jfisher12 · March 12, 2021, 8:17pm

For those interested, I was able to work out the answer.

In the Jupyter notebook I ran the query and stored the Result object, then was able to access the metadata:

from neo4j import GraphDatabase

uri = "bolt://localhost:7687"
driver = GraphDatabase.driver(uri, auth=('neo4j', 'password'))
session = driver.session()

count1 = session.run('''PROFILE MATCH (c)
RETURN count(c) as node_count''')

count1.consume().metadata

Other information available using the .consume() method can be found by replacing metadata with options like result_available_after, result_consumed_after, or profile.

Additional details can be found in the Result section of the Python Driver API Documentation.

A working example of the code can be found in a notebook on my GitHub page.

Topic		Replies	Views
Question about python neo4j-driver processing muti-threads (concurrent) Python performance	0	1094	October 4, 2019
Python neo4j Driver update 1.7 to 4.x Python	1	277	March 22, 2021
Show db hits for query using Python Driver Python performance	2	238	April 25, 2022
Neo4j.Result in python driver neo4j very slow Neo4j Graph Platform migrated	4	238	August 11, 2022
Issue with Python Driver and consume(), profile to return db hits and time Python profile	6	662	September 13, 2024

Get Certified in June!

Assess query performance with python driver

Related topics