Optimizing Graph Database Performance on High-Performance PC Desktops

Hello Neo4j Community,

I’ve been working with graph databases for a while and recently decided to enhance my system’s performance to better handle complex graph queries. While graph databases like Neo4j are powerful, optimizing them on a desktop setup for speed and reliability is crucial, especially when working with large datasets and high-level queries.

In my case, the challenge lies in balancing the hardware and software configurations, especially when using a high-performance PC desktop setup. Specifically, I am concerned about how to ensure smooth performance when executing complex Cypher queries and running large graph processing operations. My current setup includes advanced processors and a high-end GPU, but I have found that scaling these resources for graph database performance often requires more than just raw hardware power.

I’d love to hear from others who have optimized their PC desktop configurations for graph databases. What are some hardware and software considerations I should focus on for maximizing the throughput of Neo4j in such environments? Additionally, are there specific configurations or tools that help manage memory consumption and query optimization that you’ve found particularly useful?

Looking forward to hearing about your experiences and suggestions!

Optimizing neo4j for a "workload shape" is indeed a fun task. Here are some pointers assuming you are mainly working off your own workstation (don't use this for a production workload).

Some of my assumptions for a workstation:

  • Tune for one user (me :) )
  • Live with the fact that I have limited number of harddrives
  • Use memory based on situation (most of my experiments involve writing new data and running gds)
server.memory.heap.initial_size=30g  # I prio heap for gds and large queries
server.memory.heap.max_size=30g
server.memory.pagecache.size=5g # Just a fraction of what graph is on disk
db.tx_log.rotation.retention_policy=2 days 2G
gds.arrow.enabled=true # I sometimes project graphs straight from python
gds.arrow.listen_address=0.0.0.0:8491 
server.metrics.prometheus.enabled=true # I sometimes collect metrics to better understand what to tune
server.metrics.prometheus.endpoint=0.0.0.0:2004

Stop any database that is not used at the moment. Set dbms.memory.transaction.total.max if you manage to oom frequently (I don't for some reason).

Just to get you started. There is more to dive into if you use vector index (flags in new jvm's to increase performance). But usually, getting the memory right is most important.

Most workstation ssd's are quite slow in iops. So always batch your writes.

Use parallel runtime for queries that need to be "large".

For hardware/software considerations:

  • IOPS/Disk speed (M2 NvMe)
  • Memory over cpu core count normally
  • Linux ftw