Efficient query building

bears · April 1, 2021, 5:34pm

Hi, I'm currently using the official python driver in an application.

A part of this app allows people to upload files that will then be merged into neo4j, usually a CSV, or other consistent structure. Right now, for every file type I come across, we write a new custom cypher query for that datatype going forward. We do that to be as efficient as possible, as everything online seems to indicate that parameterized queries are the only way to do so correctly.

We started out with a models approach, similar to how things like py2neo do it, but saw obscene performance issues, as it was effectively starting a new session / transaction for each column of each row.

What I would like to know is where is the "actual" optimization taking place with parameterization? Is it the database doing a type of caching of the query? Is there something else the driver is doing I don't know of? How this part functions seems to be difficult to find any documentation on.

My goal is to have a part of the application where a user can tell the application the names of their columns, what type of data is in them, and what the relationships between them are. Then I am hoping I can do something similar to string building of my query on the backend, to create a "dynamic", yet efficiently parameterized query to use for that upload.

dana_canzano · April 1, 2021, 6:07pm

not exactly, maybe what you are after but

and to which we cache of generated query plans. We do not cache of query results.
Further with reference to you comment of

My goal is to have a part of the application where a user can tell the application the names of their columns, what type of data is in them, and what the relationships between them are. Then I am hoping I can do something similar to string building of my query on the backend, to create a "dynamic", yet efficiently parameterized query to use for that upload.

but parameters generally refer to values of properties. for example you might write a query similar to

match (n:Person) where n.age=$param1 return n;  {param1: 42}

and if you then resubmit

match (n:Person) where n.age=$param1 return n;  {param1: 49}

then we would not need to replan the query. But if you are suggesting someone might run

match (n:Person) where n.age=42

and someone else might run

match (n:Person) where n.zipode=02112;

then each of these 2 would need to be planned

bears · April 1, 2021, 6:28pm

Thanks for quickly replying @dana_canzano, this helps .

So from my understanding, if I pre-build a query for a file structure, which is set up with $params, then as long as that doesn't change it will be maintained in the cache and be performant?

So if I build it as something like..

label1 = "Person"
label2 = "Something"
rel = "has_rel"

query = "merge  (n:" + label1 + ") where n.data=$param "
query += "merge (n)-[:" + rel + "]->(m:" + label2 + " {data: $param2}) "
query += "return n"

...

tx.run(query, param= myAge , param2= myOtherData)
tx.run(query, param= myAge2 , param2= myOtherData2)

Then as long as the query doesn't change, it won't need to replan?

dana_canzano · April 1, 2021, 6:42pm

correct. However and just for specifics, parameters can not be used for label names. i.e. one can not run

match (n:$param1) where n.age=42  {$param1: Person}

Not that your example is doing this but.

bears · April 1, 2021, 6:44pm

Great, thank you! This will help us optimize how we structure a lot of queries.

Topic		Replies	Views
Efficient Bulk Creation of Graph Structures in Python (JPA-style Annotations?) Python cypher , knowledge-base	2	32	May 28, 2025
Need help optimizing my Queries to insert loads of Data Neo4j Graph Platform migrated	0	233	July 14, 2022
Understanding the Query Plan Cache Cypher performance , cypher , parameters , knowledge-base , plan-cache	11	1423	February 19, 2021
Parameterized query performance degradation Neo4j Graph Platform migrated	0	64	September 13, 2022
If the same cypher phase is repeatedly executed in a short time, is it possible to be retrieved data from memory? Cypher	2	197	March 30, 2022

Get Certified in June!

Efficient query building

Related topics