I have been playing around with Neo4J for a few months and am looking for feedback on whether it can actually be used for a web app, similar to, eg, Firebase Database (in terms of performance, integration, scalability etc). What I have in mind is a very simple web app, where the user can consult information stored in the DB, run some operations (typically, data analysis or statistical modelling -K-means, things like that) on a python back end and write new information in the DB. Trick is that it should be able to support many users (5k+) at the same time.
My question is simply if Neo4J can actually be used for production or if there are real limitations. And if there are any known web apps that have actually used it in production at scale as primary production db. What are the troubles ahead, and so on.
Some context on the architecture I was thinking of (I am not asking to validate the stack in this thread given lack of context on ops, it's just to give a flavor, but of course your opinion is welcomed):
DB back end (read-write): Neo4j 3.5.x
Modelling engine: Python 3.8
Search engine (if needed): Elasticsearch 7.4
Web framework: Flask 1.1
Front end: React 16.10 or something similar
PaaS: Docker 19.x
Container orchestration: Kubernetes 1.16
Yes it can be used as the primary production database. If you want a list of examples, simply watch Emil's opening keynote speech from Graph Connect 2018. There are several other companies that I've heard of using it, Ebay with their Shop Bot, Century Link for their customer service 360 app, Adobe Behance. So yes it is a production grade database.
The only question you'll need to solve which you have to do with any DB is sizing out the hardware. Like any DB if you don't provide enough resources and tune your queries, throwing 5k concurrent users at it, it'll struggle. But size the hardware appropriately (which in Adobe's case was smaller than what other DBs required) and it'll be a dream. If you purchase an Enterprise License you can leverage clustering with scale out read replicas to handle any work load you throw at it.
As far as your stack, I'd only suggest this for consideration, I've been finding less of a need for Elasticsearch ever since Neo4j implemented the Lucene Index (the same index engine of ElasticSearch). Not only can you get the same fuzzy text search in Neo4j like you can with ElasticSearch, but you can leverage the data in the graph to provide much more sophisticated search results. At the online summit, Christophe Willemsen gave a presentation about how you can do more than just fuzzy searches.
Thank you so much Mike, very insightful! I will totally look at the Lucene Index implemented in neo4j. I think I will start developing a POC in that direction and see how it goes in testing env. As for hardware scalability, I am looking at solutions in GCP so hopefully hardware should follow the DB's needs. Enterprise License is absolutely on the table, I have not looked at it carefully enough, so I will check the clustering capabilities.
Again, thank you Mike, for the insightful tips and the links to the presentations, very very useful, much appreciated!
Thanks Paul for posting such use case. Hoping you would get that app with Neo4j into Prod. Please do share your experience which would benefit the community.