What took adoption of graph database like Neo4J so much time compared to RDBMS databases?


(Shekhar Arya) #1

I just got introduced to Graph Database Neo4J and was amazed at its speed, performance, capabilities and above all ease of model development !

"What you design on whiteboard , is what is , your database model"

Why development and adoption of graph database like Neo4J was so slow till now and why it took so long being such viable and performant RDBMS database alternative?

Were we lacking in technology, analytical abilities, graph theory ? What kept the IT world devoid of this great database concept and why we were pushed into RDBMS world instead a more logical system like Graph databases?


(Andrew Bowman) #2

I imagine the use cases and current capabilities drove db development.

First there was a need to store organized data, and spreadsheet applications seemed to grow from that need, naturally emerging from pen-and-paper spreadsheet usage.

Then there was a need to centrally store this data and apply rules to it and query it, which is where RDBMS grew from, likely taking on its form out of the familiarity and usage of spreadsheets.

Once you have reliable data storage, rules, and standard and effective querying, then came the need for connected data. SQL join tables seemed to fill that need for a long while, using the same technologies already in use, as well as the familiar SQL query language. This works mostly when the connections are simple, limited, and known.

I believe only in the recent decade or two have the needs for efficiently querying connected data exceeded the capabilities provided by RDBMS. We can see this for recursive queries (friend of a friend of a friend of a...), and even more so when you do not know what types of nodes/tables you're joining together (reachability queries, or those where you're looking for connections when you don't know how they're connected), which escapes the basic capabilities of join tables.

The thing is, companies have been solving these problems on their own for awhile out of necessity (think Google, LinkedIn, Facebook, etc), and have often used their own proprietary approaches, their own private graph (or proto-graph) dbs and systems, when RDBMS wasn't enough to meet their performance or modeling needs.

We can even see a few early public solutions that seem to fit the description of a graph db (such as Objectivity/DB, created in the early 90s), even if it didn't call itself one.

The other part of this is an easy to use query language that reflects simple modeling. The prevalence of SQL and growing reliability of RDBMS helped feed and continue its popularity. For table-based data, SQL is THE language. The standardization made it easy to incorporate into education, thus ensuring a de-facto fluency in SQL in the industry.

But the complexity of queries over connected data, as well as the complexities of modeling intricately connected data, have pushed SQL and RDBMS to its limits. Tabular modeling just isn't a good fit for this. We believe the easy modeling of data in Neo4j as well as the natural feel of the Cypher query language have been instrumental to its success. Using Cypher, we can express in only a few lines what may take 10s or 100s of lines in SQL. It is able to more naturally, simply, and powerfully express patterns of connected data.

So in short, RDBMS filled a more immediate need for storing and querying data in a form that fit prevalent previous usage (spreadsheets), and only once this need was met did the industry begin realizing that working with connected data was the next step, and that the tabular modeling and querying previously used wasn't a great fit for these new use cases.

RDBMS has had decades of work put into it for stabilization and optimization, as well as SQL as an industry-standard language. DBs have been more or less synonymous with DBMS using SQL until the NoSQL movement gained momentum, and only recently has the need for such data, for simple modeling, and for simple and performant querying outweighed those advantages. Keep in mind also that the volume of data being stored has increased drastically over the decades, so the weak points in RDBMS for storing and querying connected data have only recently been magnified by the sheer volume and variety of data stored to the point where alternatives are necessary.

Graph DBs are still relatively young compared to RDBMS. We're seeing steps taken now toward the formation of an industry standard language, GQL, which will be heavily based on Cypher and Cypher-influenced languages. That should provide a similar advantage to graph dbs as when SQL was established as a standard. Different modes of storage and clustering are emerging and competing and evolving, as these are more complex with connected graph data vs the tabular formats of RDBMS.


(Shekhar Arya) #3

@andrew.bowman Thanks for well defined answer. I feel cost of 'Memory' has also played a pivotal role in acceptance of RDBMS compared to Graph DB. As Graph DB's tend to keep more portion of data in memory compared to RDBMS. As RAM/Memory getting cheaper more and more in-memory DB and graph DB's will gain acceptance.