Spring Data Neo4j 7.1.2 - findByID and findAll() causing infinite loop/crash

Hello!
We implemented a Rest backend for a usual business application based on Neo4j and Spring Data.
At the beginning we started with all the generic findById() and findAll() methods to read out the data.
But with time our model was growing more and more and our domain model now consists of about 100 model classes. By that process we were running more and more in situations with cyclic dependencies in the graph that resulted in Stackoverflow errrors in SpringData (mapper).
(Even if we did not used bidirectional relation definitions in our SpringData model classes!).
If the model is growing it is realy a challenge to not oversee a possibility to get cyclic dependencies in the graph.
As a consequence we refactored out ALL generic findBy and findAll methods from SpringData and replaced them by custom queries in our extensions of the CRUD interfaces provided by SpringData Neo4j.
So we can exactly control now to which "depth" we want to read out the needed data, which properties should be read out (and which not) and define exactly which relations should be traversed to included objects and in which depth.
And as a nice "side effect": The custom queries run in "one rush", while your generic Spring Data "find methoda" go through your related nodes "one by one" with additional queries. This change had a dramatic effect on the performance of our system!
For example, if I read out all "Company" objects in our system inlcuding its related Address-Objects and relations to "parent companies" the custom query for the "findAll()" runs 30 times faster currently (for currently about 200 companies). Despite the fact that this model object is of very limted complexity.
So I really just can recommend not to rely on the easy to use but "dangerous" and inperformant "findBy()" and "findAll()" methods povided by Spring Data if you want to implement a backend for a larger business application.
Best regards!
Alexander.

1 Like

Hi Alexander,

thanks for your answer. Yes the program has developed now and I have removed more and more "standard" SDN crud functions. With some DTOs you can easily access connected nodes with high speed (in my application it's a book-author or book-publishdate relationship, which I would like to keep as seperate nodes because of better search performance), and for example print out a book with the corresponding author. The only "predefined" query that I still use extensively is a simple boolean check if a node with a specific ID exists at all. I think that's pretty fast, at least I did not see any performance difference in comparison to writing a custom query here (although that might change with a growing dataset)

1 Like

I use both SDN and the driver in my applications. They each have a purpose. I feel SDN is a viable choice when your data model consists of domain entities, such as an invoice, customers, etc. These entities have a fixed structure and don’t have relationships between them. As such they can be managed individually with the SDN methods.

The important thing to note is that SDN matches the database to your entity, so your entity needs to contain all the data, meaning all related data. If not, the related data will no longer be part of your entity in the database. From this, I concluded that SDN is not appropriate for managing networks of data. Examples would be a social network, a rail system, IT network, etc. For these cases I create my own repository to implement methods using the driver to provide all the data manipulation operations I need on the graph and its entries.

Hi Gery,
we also ended up with a mixture of query definitions defined in Repositories extending the CrudRepositories and more individual implementations. The queries in the repositories are quite easy to implement and maintain.
However as soon as the query has to be adopted to specific needs, e.g. dedicated MATCH clauses that result from (optional) query params from a GET request, it is much easier to realize such queries with the Neo4jTemplate class, even if that Template is not so powerful as it should be.
And we NEVER EVER use the Repository save() methods for objects that do/can contain relations, except for creating an initial (empty) object/node we fill afterwards with properties and draw the relations "manually".

That sounds logical. In my projects that I don’t need SDN, I use the driver directly to avoid the SDN dependence. I have been thinking of using Neo4jClient as a substitute for the driver because of its fluent api and spring transaction integration. I would have to refactor a lot of queries though, so no rush. Maybe on my next project. The advantage of the client over the template is the client is not concerned with domain entities.

The advantage I have by the usage of the Neo4Template is that I can retrieve a full "submodel" of the graph with one single call to the Neo DB.
E.g. we have a quite complex model for "Document" objects with a lot of attributes that can be defined generically, comparable to some extend to a complex generic DocumentManagementSystem.
To get all Documents of a project (>1000) I need a query that reads out all Document objects including all properties and related attribute values (related nodes) - in one single DB call.
So each Document can include up to about 100 other nodes related by outgoing relations that may contain child nodes again.
With the Neo4JTemplate (or even with the Repository) I can write a singe (quite long) cypher query - that defines exactly which data I want to have from the subgraph. And at the end I get simply a List of Document objects from the Spring Data mapper. With all objects contained inside to the defined level of detail.
This call runs on my developer machine in <300ms including the mapping to my domain Java classes. What is really impressive, i think.
I'm not sure this is possible with the standard Java client. Would be good to know.

The driver and the neo4jClient are not aware of your domain model. You have to map the query results to your code objects.

I agree with what you are saying.