Is there a Neo4j Graph Storage documentation?

storage
(Andi Stefan) #1

I work with Neo4j to store a huge and dense Knowledge Graph. I want to build a polyglot persistence architecture to speed up graph algorithms. To do this I need to know all details about the native Graph Storage of Neo4j.

This is documented in Chapter 6 in the 2nd Edition of the Book "Graph Databases" from Ian Robinson, Jim Webber & Emil Eifrem. The problem is, that the described Graph Storage references Version 2.2 of Neo4j, because the book was written in 2015. Is there somewhere an up-to-date (Version 3.5.2) documentation about that?

0 Likes

(Michael Hunger) #2

What is the size and shape of your knowledge graph?

What kind of algorithms do you want to run?

The storage is an implementation detail, there are different approaches and in the future there will be even more options.

0 Likes

(Andi Stefan) #3

Thank you for the reply.

I expect the knowledge graph to have some million nodes. The graph has a complex structure and stores documents, authors of this documents, journals, keywords referenced by the documents and so on (based on data from PubMed).
I want to run algorithms, that have to check the attribut values of the nodes.

Actually I want to know if the statement in the book "Graph Databases" (more details in above) on page 156 and 157 is still up-to-date:
...

For each property’s value, the record contains either a pointer into a dynamic store record or an inlined value. The dynamic stores allow for storing large property values. There are two dynamic stores: a dynamic string store (neostore.propertystore.db.strings) and a dynamic array store (neostore.propertystore.db.arrays).

...

Neo4j supports store optimizations, whereby it inlines some properties into the prop‐ erty store file directly (neostore.propertystore.db). This happens when property data can be encoded to fit in one or more of a record’s four property blocks. In practice this means that data like phone numbers and zip codes can be inlined in the property store file directly, rather than being pushed out to the dynamic stores. This results in reduced I/O operations and improved throughput, because only a single file access is required."

...

0 Likes