Text-to-Cypher Datasets for LLM Fine-Tuning: Query Complexity Levels

Good morning,

I am a student working on a project on Text-to-Cypher. I have been conducting research on the literature regarding training datasets used to fine-tune LLMs for improving their Cypher generation capabilities. While the literature often highlights the difficulty LLMs face with advanced Cypher queries, such as third-level or higher navigation queries, I have not seen examples of such queries in the synthetic datasets used for training these models.

Am I correct in this observation? Additionally, do you think it is reasonable to focus on first-level queries when creating a synthetic dataset for fine-tuning, as I have done for some LLM models?

Thank you very much

You’ll find out that complex queries are harder/impossible to find - I have a 700-line statement for the creation of some n-level graphs that consume parameters, and I had to build it step-by-step, as there was a lot of trial-and-error.

Most of those use cases are not going to be public, which are the ones that you are seeking.