Text-to-Cypher Datasets for LLM Fine-Tuning: Query Complexity Levels

nada · August 27, 2025, 9:14am

Good morning,

I am a student working on a project on Text-to-Cypher. I have been conducting research on the literature regarding training datasets used to fine-tune LLMs for improving their Cypher generation capabilities. While the literature often highlights the difficulty LLMs face with advanced Cypher queries, such as third-level or higher navigation queries, I have not seen examples of such queries in the synthetic datasets used for training these models.

Am I correct in this observation? Additionally, do you think it is reasonable to focus on first-level queries when creating a synthetic dataset for fine-tuning, as I have done for some LLM models?

Thank you very much

joshcornejo · August 27, 2025, 2:39pm

You’ll find out that complex queries are harder/impossible to find - I have a 700-line statement for the creation of some n-level graphs that consume parameters, and I had to build it step-by-step, as there was a lot of trial-and-error.

Most of those use cases are not going to be public, which are the ones that you are seeking.

Topic		Replies	Views
Crowdsourcing a text2cypher dataset Projects & Collaboration	0	473	January 25, 2024
New Blog: Benchmarking Using the Neo4j Text2Cypher (2024) Dataset Community Content & Blogs	0	32	November 15, 2024
New finetuned text2cypher model based on Llama3 Community Content & Blogs	0	111	May 17, 2024
Prompting to improve LLM response to text2cypher retriever GraphRAG	5	142	August 4, 2025
New Blog: Explore Iterative Refinement for Text2Cypher Community Content & Blogs	0	26	October 6, 2025

Text-to-Cypher Datasets for LLM Fine-Tuning: Query Complexity Levels

Related topics