Optimizing Neo4j Database Storage for Large-Scale Servers

Hello everyone,

I'm currently working on deploying Neo4j on a large-scale server environment, and I’m seeking advice on optimizing its server storage performance. Given the growing size of the database and the complexity of queries, I’m running into issues with disk I/O and overall performance degradation.

Here are a few areas I'm focusing on, and I would love your insights or suggestions:

  1. Hardware considerations : What are the best practices for choosing the right disk type (e.g., SSD vs HDD)? Are there specific configurations like RAID that work better for Neo4j’s storage requirements?
  2. Memory and Disk Cache : How much memory should be allocated for the page cache, and should we consider increasing it beyond the default settings for large-scale deployments?
  3. Indexing Strategy : How do you optimize indexes for high-performance storage, especially when dealing with large graphs and high cardinality properties? Are there specific types of indexes that work better in such environments?
  4. Graph Partitioning : Has anyone successfully implemented partitioning in Neo4j for distributed storage across multiple servers? Any performance tips or caveats?
  5. Backup and Recovery : How do you handle backups efficiently without impacting the performance of the live system, especially with large volumes of data?

Looking forward to hearing your experiences and any additional tips you might have. Thanks in advance!