Just reaching out to the community here for best practices. Using Neo4J and large volumes of data seem rather synonymous. When we're firing requests to Neo4J using the neo4j-rx-spring-boot our applications seem to get to a point where they aren't capable of formulating the response anymore. Though I haven't been able to observe an actual error yet - the connection is cut off, presumably due to running out of memory as I don't have the issue with smaller requests.
The following setting seems to help:
Perhaps also on a pod level (running the API in OpenShift using Docker), what are the numbers that I should be increasing if I'm getting unexplainable errors on large volumes?
As a general information: The Spring Data Neo4j RX project were put on hold a few month back in favour of placing it as the successor of the official Spring Data Neo4j in version 6. (Currently available as Spring Data Neo4j 6.0.2 or indirectly via Spring Boot 2.4 and the official spring-boot-starter-data-neo4j)
We also brought in some improvements regarding querying and processing over the past months that should benefit you. From your relatively vague description of large volumes, it is usually some kind of contradiction to work with a object-something-mapper and a lot of data in one operation.
At least during the mapping process, the application needs to keep a reference cache to access previously mapped entities later to reference it because they might appear in another record of the same result set.
I am sorry that I cannot give you any configuration hints for your current problem to set your environment to because I do not have enough experience with running those kind of applications on "cloud platforms".
If you can upgrade and still experience problems please provide some more insights into your domain (number of relationships and similar) and amount of nodes you want to process during one operation.
We will start with upgrading to Spring Data Neo4j 6.0.2 and see if there is a similar performance. Will get back to you if these problems still exist.
Lets assume we have a database with N nodes and about N*2 relations (two ways). Where a query on these N Nodes would amount to 100MB of data (nodes and relations) with a query duration of about 2 minutes or so. Which configuration settings would need to be altered to allow multiple queries to be fired on this Neo4j database at the same time? How much extra data is loaded into memory while performing a query relative to the result?
We can leave the cloud pods out of this for now, giving a Pod an unlimited (close enough) amount of memory and CPU doesn't solve the problem. As such I'm relatively certain that this is happening at Spring level.