Using neo4j for 'pure analytics'

Hi everyone,

I have only just started learning about graph databases while also learning about different types of relational databases.

In relational databases, there's a distinction between two types:

  • OLTP (online transactional processing, i.e. Postgres, MySQL, etc.) and
  • OLAP (online analytical processing, i.e. ClickHouse, DuckDB, etc.)

The key difference between them is that OLTP databases put a lot of emphasis on data integrity and hence implement transactions, while OLAP databases deliberately choose not to implement such features to allow for better query performance, especially with typical "Big Data Analytics" queries like aggregation, top-k, etc.

The difference in query execution times between the two can be extreme, especially for huge datasets (billions of data points).

Is there a similar distinction in 'graph land'? Are there specific graph databases that are more suitable for analytics involving huge collections of data?

This question came up for me, because in this video I heard the phrase "neo4j isn't really an analytical database".

How much does ACID compliance impact query performance in neo4j? Should I rather pick an alternative if I intend to use a graph database solely for analytics purposes? If so, which ones can you recommend?

Thanks in advance for any help with that :slight_smile:

Best regards,
Samo

@Sejmou

There may be some truth in the video you link however it should be noted that video is near 7 yrs old and speaks of Neo4j version 2.3.0. The current shipping Neo4j version is 5.10.0. So lots of changes.

As to your question, some of this is, how much of the database can you fit in RAM. But also if you are running a query such as 'compute the avg duration of all phone calls in EU for years 2022 and 2023' one way or nother we need to simply iterate over all the phone calls, whether that be 100 million or 100 billion, and compute the avg