Hi all.
I'm using Neo4j 4.4.4 Community Edition with docker and rolled up in k8s. The app designed in Python.
The resources reserved for pod: 80 GB RAM, 10 CPUs, 80GB ephemeral-storage. Neo4j.conf: 46GB page cache and automatic allocated heap size.
The problem is following: When DB is started up it consumes around 28GB, with workload the memory starts leaking up to reserved for pod memory limits. Should I strictly follow suggestions from neo4j-admin memrec?
I'm using HTTP API for transactions with committing. Which gave us pretty good performance in terms of sending a large number of small simultaneous requests. The query basically the same, but input data (starting nodes) constantly changing.
There is example of query:
MATCH (source_node: Person) WHERE source_node.name in $inputs
MATCH (source_node)-[r]->(child_id:InternalId)
WHERE r.valid_from <= datetime($actualdate) < r.valid_to
WITH [type(r), toString(date(r.valid_from)), child_id.id] as child_path, child_id, false as filtered
OPTIONAL MATCH p_path = (child_id)-[:HAS_PARENT_ID*0..50]->(parent_id:InternalId)
WHERE all(a in relationships(p_path) WHERE a.valid_from <= datetime($actualdate) < a.valid_to) AND
NOT EXISTS{ MATCH (parent_id)-[q:HAS_PARENT_ID]->() WHERE q.valid_from <= datetime($actualdate) < q.valid_to}
WITH DISTINCT last(nodes(p_path)) as i_source,
reduce(st = [], q IN relationships(p_path) | st + [type(q), toString(date(q.valid_from)), endNode(q).id])
as parent_path, CASE WHEN length(p_path) = 0 THEN NULL ELSE parent_id END as parent_id, child_path
OPTIONAL MATCH (i_source)-[r:HAS_ISSUER_ID]->(issuer_id:IssuerId)
WHERE r.valid_from <= datetime($actualdate) < r.valid_to
RETURN DISTINCT CASE issuer_id WHEN NULL THEN child_path + parent_path + [type(r), NULL, "NOT FOUND IN RELATION"]
ELSE child_path + parent_path + [type(r), toString(date(r.valid_from)), toInteger(issuer_id.id)]
END as full_path, issuer_id, CASE issuer_id WHEN NULL THEN true ELSE false END as filtered
And the request example:
result = requests.post(
"http://neo4j.hostname.com:7474/db/neo4j/tx/commit",
json:json_data,
headers:headers
).json()
When the memory consuming face the limits, the performance rapidly drops.
1. Please, could you explain why exactly this happening and how to avoid the memory leaking? Does the performance drops because of GC?
2. Can use Python Driver instead the HTTP API with transaction committing option?