tldr: Big neo4j dataset, trying to load into page cache. Only a part of the db being loaded. Queries with count very slow. Sysinfo and memrec showing different db data volumes. How to load the entire graph into memory, and reduce db hit counts?
I have a 25G 3.5.12 Enterprise Neo4j Database, running on a 52G server. Currently I have set the page cache to be 28G, and heap size to be 12G in the conf file. I want to load the entire dataset into cache for fast querying.
1. Even after warming up with
apoc.warmup.run(true, true, true), my queries are still hitting the database millions of times. I have also tried the basic
MATCH (n)-[r]-(m) RETURN count(r), count(n.name) etc., but the same problem persists.
Here is the post apoc warmup result:
This is a sample profile query:
2. Another very hard to understand thing is that memrec and sysinfo seem to be showing different results.
Memrec says I have 7.7G data volume, and indexes, while sysinfo and du say I have 25G.
Exactly as the numbers indicated by memrec, the system is only using approx. 8G + 12G(Heap) from neo4j :
May someone please help me understand what I am doing wrong here? How can I get all the graph nodes and relationships in the page cache?
This is a situation where halin would help:
In general what you'll see with the page cache is that as you do queries, data will be loaded from disk and put into the page cache as long as you have space for it in the page cache. Warmup is a good start but it doesn't guarantee to fill the page cache per se.
What I'd be curious to see is after your warmup, what Halin says about your page cache utilization %.
Thanks a lot for your response!
Halin seems to be showing identical values as sysinfo, and memrec. 7.7G page cache used, leading to 26% consumption.
Also about this point:
"In general what you'll see with the page cache is that as you do queries, data will be loaded from disk and put into the page cache as long as you have space for it in the page cache. Warmup is a good start but it doesn't guarantee to fill the page cache per se."
After working on the problem through the day I realized that my graph.db store size is indeed 4.4G, and index is 3.3G, justifying 7.7G from memrec. The rest is transaction logs. This means the entire dataset is in the page cache already?
The problem then seems to be with the count aggregation:
MATCH (t:Track)-[:HAS_GENRE]->(g:Genre) WHERE g.name IN ['rock', 'metal'] RETURN t.name, collect(DISTINCT g.name) AS genres, count(DISTINCT g) AS score ORDER BY score DESC LIMIT 20
If I remove count and collect from here, the query returns instantly, but these aggregations are hitting db millions of times.
Any ideas on what might be going wrong here? How can I optimize this?
Here is the profile:
Notice that, even though the genre names seem to be cached, but still every single time the db is being hit millions of times, even when so much page cache is still available to use?