Hello community! Thanks for your feedback on the pro's and con's of using Java Spring Boot vs direct Cypher queries in application code.
Question 1)
Neo4j Team - do you have data on the percentage of corporate application development projects done in OGM vs direct Cypher in Code?
Question 2) Is direct Cypher in code in Spring Boot considered to be a professional design pattern by the Neo4j team and community? I'm working with a Java programmer and architect who consider only OGM to be a design pattern, and look down on direct Cypher within applications as being unprofessional.
Question 3) I am seeing many advantages to direct Cypher in application code including:
Speed of development - writing and testing complex cypher seems much faster than doing complex queries in OGM and learning a second query language with its own quirks.
Ease of QA. Writing queries, QA and debugging of direct Cypher in application code seems faster and less complex than OGM, and can be done by a Neo4j expert vs a Java coder.
Performance and Index tuning. I've not had direct experience in working with OGM but it seems like there is likely a steep learning curve and challenges in tuning Cypher based OGM. What would the tuning process even be?
Overall, in working the start of the current project with SpringBoot DSL, it forced our Java programmer to learn the DSL version of Cypher and has been very hard to test and QA. My impression at this time is that using OGM vs direct Cypher would increase development time and cost by 50 to 100% for a team that does not have extensive OGM experience for a corporate backend API 3 month project.
I welcome discussion on whether direct Cypher in SpringBoot is considered a professional design practice, and the pro's/cons of direct Cypher vs OGM, especially with regards to real world project development time, performance tuning and QA. Thank you in advance.
I'll try my best to answer your questions (from the perspective of Neo4j-OGM / Spring Data Neo4j developer)
Q1)
This is hard to tell because there is no real measurement in place to see the how the libraries -pure driver in contrast to an OGM that uses the driver- are used. The raw maven central download stats don't give us those insights.
I would say both are used on the top level (developer access) in an equal amount.
Q2)
There is even one option missing when talking about Spring Boot (also valid with Neo4j-OGM for Quarkus):
Pure Java driver and Cypher queries
Full OGM feature set with generated Cypher and mapping
raw (direct) Cypher statements sent via those OGMs
From my experience nothing is wrong or a bad design decision. As always it heavily depends on the use-case.
Given that OGMs are general purpose, we strive towards supporting a lot of use-cases on a best-effort base. But there are graphs or use-cases that we just cannot cover very efficient and at the same time keep the compatibility for other use-cases.
This is mostly the time when users introducing their own Cypher statements to "tune" Neo4j-OGM / SDN but leave the mapping to those frameworks.
There are also use-cases where the amount of classes to map is manageable (if there is something to map at all). Those applications mostly don't even use Neo4j-OGM or SDN. Side note: Spring Data Neo4j also has a Spring transaction aware Neo4j Java driver wrapper that can be used without any automatic mapping.
Q3)
1.)
learning a second query language
Sorry, I don't get this point. With OGMs and without any Neo4j experience, I would even say that it is much easier to jump into the world of graphs without learning Cypher first.
2.)
Definitely true. It heavily depends where the development team comes from. But having a reasonable large class graph will push them to write the boilerplate mapping code manually. And this can also have a huge impact on the test suite.
3.)
Creating indexes is usually done outside the OGM.
As mentioned in Q2 it is perfectly fine to write custom Cypher and hand it over to the OGM. So there is an option to tune the queries created by the OGM by replacing them. Of course those queries fulfil some requirements in the data that gets returned (e.g. relationships and related nodes) but this documented or can be seen in the (verbose) log output.
4.)
Do you mean the Cypher-DSL? Even though I think it is a great library to help people write correct Cypher, there is no need to use it.
From my experience, I think it matters on your needs. For example, I have one microservice I built that manages orders. An order is independent of all other orders and consists of a few one-to-one relations and a one-to-many relationship to order items. This is very easily modeled in SDN, and the repository methods provided meet my needs to create and fetch orders. I created a few projections using interfaces to get summaries of orders. This was also very easy to do in SDN.
On the other hand, I have a micro service that manages mixed entities that are part of a hierarchy of these entities. There is no limit to the depth or breadth of these structures. For this case, I didn’t find SDN a good fit, as I am creating/updating a node at a time, and possibly adding/removing child nodes that already exists. I also need to traverse the graph and calculate complex metrics. I wrote custom procedures for this.
For this use case, I found the Java driver a good fit as I could create methods in my repository to perform the specific operations I need on the graph and I can call my custom procedures and process the results very easily.