In Python neo4j.Result.to_df() slow conversion to pandas df (for very small result)

Hi all,

I am interfacing with my Neo4j DBMS from Python using the GraphDatabase driver.

I am running a cypher running personalized page rank on a graph projection and returning a single value. This occurs fast enough (~0.8s for a graph with about 12k nodes and 200k edges).

Somehow, converting to a panda data frame, this takes ~70s. (Neo4j v 5.23.0)

This is my code snippet:

import neo4j
from neo4j import GraphDatabase

driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USERNAME, NEO4J_PASSWORD))
n4j = driver.session(database=NEO4J_DATABASE)

cypher_1 = """
MATCH (g:Gene{name:$excluded_source})
RETURN id(g) AS excluded_source_id
"""

cypher_2 = """
MATCH (crc_genes:Gene{source:'TCGA'}) WHERE crc_genes.name <> $excluded_source
WITH crc_genes
CALL gds.pageRank.stream($graph_proj_name, {
dampingFactor: 0.85,
maxIterations: 20,
sourceNodes: [id(crc_genes)]
}) YIELD nodeId, score
WHERE nodeId = $excluded_source_id
WITH gds.util.asNode(nodeId) AS node, SUM(score) AS score
RETURN COALESCE(node.name, node.ENSP) AS gene_identifier, score
"""

# Cypher returning node ID of gene of interest
result_1 = n4j.run(cypher_1, excluded_source = EXCLUDED_SOURCE, graph_proj_name = PROJECTION_NAME_1)

excl_source_id = neo4j.Result.to_df(result_1)

EXCLUDED_SOURCE_ID = excl_source_id.loc[0, "excluded_source_id"]

# Cypher calculating personalized page rank
result_2 = n4j.run(cypher_2, excluded_source = EXCLUDED_SOURCE, graph_proj_name = PROJECTION_NAME_1)

# Line that takes over a minute to run!
df = neo4j.Result.to_df(result_2)

The output looks something like this:

print(df)
 gene_identifier     score
 0           APC  0.093877

In the past (a different graph), this has taken fractions of a second:
Neo4j v 5.21.0 and a smaller graph - 1.3k nodes and 19k edges.
Edit: With the smaller graph, the line also runs quickly in v 5.23.0.

Why is it taking so long? I have tried returning just the score as a float; I also tried filtering before RETURN within the cypher, as you see in the code, but this changes nothing.

Any help is appreciated.

The difference between the operations performed on the two graphs (slow and fast at converting to df) was not only in size but more importantly the number of sources I used for personalized page rank.

In the end this made the algorithm slower. What is still unclear is why only upon execution of neo4j.Return.to_df() is this observed. My guess is that n4j.run(cypher) does not run but queue the command to be run?