Hi, I'm using neo4j for quite a while now, so am familiar with the basics.
So my dataset contains multiple CSVs - nodes & edges files, which I import using bulk loader.
It contains various columns with integers and string values. Now I need to index them - multiple of them.
Before that here are some stats:
Total server RAM: 128 GB
Total CSV size: 34 GB
### neo4.conf -> configured as per results from neo4j-admin memrec
Initial & max heap size: 31 GB
Page Cache size: 78 GB
Now when I load the data and start the neo4j server, the initial RAM increases upto 32GB, which is reasonable because of the heap size. But when I index a column from node CSV (it contains integers ranging from 1-4):
CREATE INDEX word FOR (t:TOK) ON (t.word);
The RAM boosts upto ~ 65 GB. Now when I index other columns with the same datatype, the RAM increases only by 3 gigs or less, for each and every new column.
I tried changing the order in which I was indexing the columns, but the general pattern is that the 1st index occupies maximum RAM, while the next ones take significantly less.
Now here are my questions:
- What performance implications does the amount of memory allocated has on my indexed data?
- How does neo4j decide the amount of memory it allocates for the indexes?
- What's the best approach to optimally distribute the memory among various indexes?