Memory issue for a query

Hi all,
I recently upgraded Neo4j from 4.4.2 to 2025.03.0. After that the query has failed:

MATCH (:Person{id:'0000001'})<-[:PARENT*0..]-(n:Person) 
WHERE date(n.when_updated) < date('2025-04-14')
RETURN n.id, n.username, n.password ORDER BY n.id_normalized;

The error:

The allocation of an extra 2.0 MiB would use more than the limit 12.0 GiB. Currently using 12.0 GiB. dbms.memory.transaction.total.max threshold reached

The number of nodes is 3.5M, whole database uses 2.5GB on disk and the container where Neo4j is running, never consumer more than 3GB.

I tried to limit the output and on 200K records it could return the result. Container's memory consumption raised to 4.5GB. At the same time profiler has shown total memory allocation 5.3GB. So, I'd expect at least twice as big output limit to be executed successfully. So, it's very confusing.

I've increased heap and page size to the point when it couldn't start because it exceeded physical RAM capacity. So the memory allocation is almost maximum possible.

What puzzles me at most is that before the upgrade same query ran without problems. Can it be fixed somehow? I am considering splitting outcome and combining results in a calling program but I'd like to avoid that if possible.

Update: I tried to get result in chunks but running same query with SKIP 3000000 LIMIT 20 added also results same error. I don't really understand why LIMIT 200000 produces result by SKIP 3000000 LIMIT 20 does not.

You cantry to change memory setting in the file neo4j.conf to:

dbms.memory.transaction.total.max=24g
dbms.memory.heap.initial_size=8g
dbms.memory.heap.max_size=8g

and to process the query in batches:

https://neo4j.com/labs/apoc/4.4/overview/apoc.periodic/apoc.periodic.iterate/

CALL apoc.periodic.iterate(
  "MATCH (:Person {id: '0000001'})<-[:PARENT*0..]-(n:Person) WHERE date(n.when_updated) < date('2025-04-14') RETURN n",
  "RETURN n.id, n.username, n.password",
  {batchSize: 5000, parallel: false}
)

BTW, when you have the following type of match MATCH (:Person{id:'0000001'})<-[:PARENT*0..]-(n:Person) , you will get a row back for all the intermediate paths, i.e.

MATCH (:Person{id:'0000001'})<-[:PARENT]-(n:Person) 
MATCH (:Person{id:'0000001'})<-[:PARENT]-()<-[:PARENT]-(n:Person) 
etc.

If you paths are very long, you are going to get a lot of intermediate paths back. Do you need these, or just the longest path? If just the longest, you can add a predicate to filter out the intermediate paths.

MATCH (:Person{id:'0000001'})<-[:PARENT*0..]-(n:Person) 
WHERE NOT EXISTS ( (n)<-[:PARENT]-()  ) and
date(n.when_updated) < date('2025-04-14')
RETURN n.id, n.username, n.password ORDER BY n.id_normalized;

I'm not sure apoc.periodic.iterate is supposed to return anything. In fact it returns result, but it's only a report of execution, not actual dataset.

And could confirm that the issue doesn't occur in version 4.4.2 but occurs in 5.26 and 2025.03.0. So it seems newer versions have memory management problems. Also I could see that in 4.4.2 the container's memory consumption quite quickly raised to 6GB, that didn't happen for 2025.03.0. So that what I'd expect full dataset would take in memory.

I don't need paths, I need all nodes, that are under the root one.

It would return only leaf nodes but I need all of them.

I get it. See if this run more efficiently.

MATCH path = (:Person{id:'0000001'})<-[:PARENT*0..]-(n:Person) 
WHERE NOT EXISTS ( (n)<-[:PARENT]-()  ) and
date(n.when_updated) < date('2025-04-14')
UNWIND nodes(path) as node
RETURN node.id, node.username, node.password 
ORDER BY node.id_normalized;

Does UNWIND loose some part of context and thus is more memory optimized?