Showing results for 
Search instead for 
Did you mean: 

Head's Up! Site migration is underway. Pause, resolving how to handle anonymous content

Query - large volumes in Spring Data

Node Clone

Just reaching out to the community here for best practices. Using Neo4J and large volumes of data seem rather synonymous. When we're firing requests to Neo4J using the neo4j-rx-spring-boot our applications seem to get to a point where they aren't capable of formulating the response anymore. Though I haven't been able to observe an actual error yet - the connection is cut off, presumably due to running out of memory as I don't have the issue with smaller requests.

The following setting seems to help:

Perhaps also on a pod level (running the API in OpenShift using Docker), what are the numbers that I should be increasing if I'm getting unexplainable errors on large volumes?


Node Clone

There's some real problems here, when doing multiple requests on the same API, the pods actually crash.. not good. Increase max in memory size doesn't seem to do much here. Are we the first to handle large data sets with a Spring data API on neo4j?

As a general information: The Spring Data Neo4j RX project were put on hold a few month back in favour of placing it as the successor of the official Spring Data Neo4j in version 6. (Currently available as Spring Data Neo4j 6.0.2 or indirectly via Spring Boot 2.4 and the official spring-boot-starter-data-neo4j)

We also brought in some improvements regarding querying and processing over the past months that should benefit you. From your relatively vague description of large volumes, it is usually some kind of contradiction to work with a object-something-mapper and a lot of data in one operation.
At least during the mapping process, the application needs to keep a reference cache to access previously mapped entities later to reference it because they might appear in another record of the same result set.

I am sorry that I cannot give you any configuration hints for your current problem to set your environment to because I do not have enough experience with running those kind of applications on "cloud platforms".

If you can upgrade and still experience problems please provide some more insights into your domain (number of relationships and similar) and amount of nodes you want to process during one operation.

Node Clone

We will start with upgrading to Spring Data Neo4j 6.0.2 and see if there is a similar performance. Will get back to you if these problems still exist.

Lets assume we have a database with N nodes and about N*2 relations (two ways). Where a query on these N Nodes would amount to 100MB of data (nodes and relations) with a query duration of about 2 minutes or so. Which configuration settings would need to be altered to allow multiple queries to be fired on this Neo4j database at the same time? How much extra data is loaded into memory while performing a query relative to the result?

We can leave the cloud pods out of this for now, giving a Pod an unlimited (close enough) amount of memory and CPU doesn't solve the problem. As such I'm relatively certain that this is happening at Spring level.