Neo4j Database Concepts

Hi

I have the following questions:

  1. What's Neo4j max database connection?

Based on Config.MaxConnectionPoolSize, it seems that you can have infinite connections. Is that correct? Is there a formula to determine how many connections should be set? Also is this property same for both community and enterprise edition?

https://neo4j.com/docs/api/dotnet-driver/current/html/71b997d2-93b2-9a9b-0601-30e34fcf6e2d.htm

  1. What's Neo4j max database size per server / cluster?

Based on the licensing info, you can have 34B nodes for community and no limit for enterprise edition. However, it hard to understand how much that is. What is the max database size in TB / PB?

  1. Does Neo4j has secondary index?

There is documentation about Neo4j index, however it's unclear to me if there is the concept of secondary index in Neo4j?

The main difference between primary and secondary index is that the primary index is an index on a set of fields that includes the primary key and does not contain duplicates, while the secondary index is an index that is not a primary index and can contain duplicates.

  1. What are the limitations if you use docker / vm / bare-metal?

@andrew.bowman, @david.allen, it would be great if there is some answers to these. Don't find it in forum.

We haven't seen any PB dbs. Definitely single digit TB. I don't think we've seen dbs get into double digit TBs though.

Keep in mind that by design graph dbs lean toward normalized data, so there may be less redundant data needed in a graph db.

Neo4j doesn't have a concept of primary keys. We support unique constraints (which are index-backed automatically) as well as normal indexes (where the property isn't unique, so this would be equivalent to your secondary indexes).

@andrew.bowman

Is the maximum single digit TB DB observed from Neo4j Aura Cloud? Does that also applies to self-hosted enterprise edition Neo4j DB? Because if I use my own hardware, theoretically the maximum DB size depends on my own storage size?

Secondly, can I confirm that Neo4j doesn't follow CAP theorem?

  • Consistency: Yes
  • Availability: No, because casual availability and not high availability
  • Partition tolerance: Yes, because sharding is available via Neo4j Fabric

Aura Professional allows up to 256 GB storage currently. You can contact us to ask for more info about scaling with Aura Enterprise.

Self-hosting you are correct your own hardware would be the limitation. You may need to enable the High Limit format though to deal with the number of elements:

Note: Standard is the default in Enterprise and the only format available in Community. It has a limit of 34B nodes, 34B relationships, 68B properties.

High Limit is only available in Enterprise and supports virtually unlimited numbers of nodes, relationships and properties.

CAP is more about registers than databases, but to approximate, in a causal cluster we favor CP over A.

Causal clusters provide causal consistency (for reads, when using bookmarks explicitly or when executing queries within the same session) or eventual consistency (when not using bookmarks, a follower or read replica may not yet be caught up with the latest transactions). Queries directed to the leader or a single standalone instance always have read-committed consistency.

Ignoring Fabric for the moment and just looking at a causal cluster, the Raft algorithm used by causal clustering grants us resiliency in the face of network partitions and grants us some degree of availability in the formula of M = 2F + 1 where M is the number of Core Servers required to tolerate F faults. Or, put more simply, we perform quorum commits, and quorum is required for both leader elections, and to keep write capability.

If we lose quorum, then the cluster sacrifices (write) availability, and falls into a read-only mode in order to preserve consistency until quorum is restored. As long as quorum is retained, we can suffer core failures and still offer write availability.

1 Like