Hello from Dresden, Germany

Hello from Germany!
My name is Alexander and I'm a Java developer from Germany and I'm working as a software consultant with focus on backends based on Spring Boot, Spring Data, JPA, ...
Additionally I started developing OWL-based domain ontologies 20 years ago.
Five years ago I found the Neo4j project but from the viewpoint of a Java developer the toolchain was not efficient enough for me to use it in commercial projects.
Two year ago I got the great oppurtuniy to convince a business partner to develop a commercial Java based backend based completely on a graph model with Neo4j as high performance database backend.
And, fortunately, I realized that the Spring Data framework evolved enough during the past years to be used now efficiently for a Java based rapid development.
After one year of development our product started in the market in 01/2023 and since then works even better and faster as expected by the customer. The product is still under heavy development to introduce more and more functionaliy and so we are happy to profit from the very good extensibility of the model in the graph database to include new use cases that need permanent model extensions.
Of course, we had to manage a lot of challenges in using Neo4j and the Spring Data OGM as high performance backend of a typical commercial backend (replacing Postgres) but until today we could manage all challenges and a couple of times could impress our customers especially with a huge performance on reading transactions on operations that produce large and complex resultsets.
Our backend provides a typical REST API with JSON objects to web clients and the graph model is more or less completely hidden behind our OGM.
Maybe I can help the community here, if somebody intends to use Neo4j with Spring Data within a Java backend and runs in one of the numerous challenges for that.
Best regards.
Alexander.

3 Likes

Hi @alexander.gehre thanks for your introduction , looks like an interesting topic for graph database performance. In my opinion an explanation about it could be so good.

Greetings

Hi, let me give you a simplified example in order to explain, how we boost up the Neo4j performance and avoided Stackoverflow errors on our Java backend with simple to use Spring Data Neo4j standard functionality.

Think of a standard use case for a Rest backend:
GET on a Users endpoint to get all registered users including necessary (sub-)data in the User objects with one single Rest call to display the Users on a Web backend.

Let me start with a simplified demo model for User objects:

@Node
@Data
public class User {

    @Id
    @GeneratedValue(generatorClass = UUIDStringGenerator.class)
    private String uuid;
    private String title;
    private String firstName;
    private String lastName;
    private String email;
	
    @Relationship(type = "HAS_ADDRESS", direction = Relationship.Direction.OUTGOING)
    private Address address;

    @Relationship(type = "HAS_COMPANY", direction = Relationship.Direction.OUTGOING)
    private Company company;
	...
}

@Node
@Data
public class Address {

    @Id
    @GeneratedValue(generatorClass = UUIDStringGenerator.class)
    private String uuid;

    private String country;
    private String postalCode;
    private String city;
    private String street;
    private String houseNumber;
    ...

}

@Node
@Data
public class Company {

    @Id
    @GeneratedValue(generatorClass = UUIDStringGenerator.class)
    private String uuid;
    private String name;
    private String email;

    @Relationship(type = "HAS_ADDRESS", direction = Relationship.Direction.OUTGOING)
    private Address address;

    @Relationship(type = "HAS_PARENT_COMPANY", direction = Relationship.Direction.OUTGOING)
    private Company parentCompany;
	...
}

with a simple CRUD interface like

public interface UserRepository extends CrudRepository<User, String>

you can now easy write code like this:

List<User> users = userRepository.findAll();

However, this is not a good idea, if you have a lot of Users in your backend.
Spring Data will now start in getting all Users in one call.
BUT: After that, it will resolve for each User the Address one by one.
Same with the company.
And: In my small example it is possible to model a cyclic dependency in the part with the parent Company. Here you would get a Stackoverflow error!

So you end up the a really poor performance and will have a good chance for getting a severe error on the backend.

What can we do?
Simply do not use the "findAll()" method, but extend your Repository with a method including a custom query:


@Query("""
        MATCH (user:User)
        RETURN user{.*,
            User_HAS_ADDRESS_Address:
                [(user)-[:HAS_ADDRESS]->(address:Address)
                    | address{.*}],
            User_HAS_COMPANY_Company:
               [(user)-[:HAS_COMPANY]->(company:Company)
                   | company{.uuid, .name,
				        Company_HAS_COMPANY_Company:
				             [(company)-[:HAS_PARENT_COMPANY]->(parentCompany:Company)
							   | parentCompany{.uuid, .name}]
					 }]
        }
        ORDER BY user.lastName, user.firstName
        """)
List<User> getAllUsers();

With that simple query you can also simply write

List<User> users = userRepository.getAllUsers();

in your service method but now:

  • you will get the exact data you need for your REST client - and nothing more
  • you will get the result with ONE SINGLE call to the Neo DB i.e. with maximum of possible performance
  • Spring Data will not waste time to go uncontrolled through your model (sub graph) recursively for resolving data you do not need
  • there will be no chance for a Stackoverflow anymore here, as for the parentCompany the parentCompany will not be resolved at all anymore.

From our experience we can say, that we improved the performance for such GET calls on a factor of 20 to 100 depending on the complexity of the (sub-)model and the amount of data.

I hope with that small demo I can help some developers here to use Spring Data more efficiently, with higher performance and less errors from automated resolving of cyclic dependencies in sub graphs.