Neo4j vs. AWS Neptune

I have someone trying to convince me that we should switch from Neo4j to AWS Neptune instead. Other than we all know that Neo4j is superior, can anyone provide me with specific talking points to bring up that would support us staying on Neo4j?

Something from the other-side that they're arguing is that "it's AWS it has to be good" and "AWS claims it's fast, so it must be" and it's a managed service so we don't need to manage an EC2 instance or storage or anything.

One thing that I find important is the cypher syntax. Due to how cypher is structured, I think that it can be easier to communicate between all the different parties involved in projects. The cypher syntax is a great way to communicate between conceptual to technical possibly making it easier for people to understand what is going within the graph itself.

The more people understand the better the information can be shared and built upon. The cypher syntax was by far one of the top reasons I ended up going with Neo4j over other options.

In that same vein, you have products like Bloom with too provide good ways for other "non-technical/developer" individuals to interact with data.

It isn't just about being (insert fancy words and lingo), but about being approachable by all those who are interested and/or affected by the data.

6 Likes

We recently evaluated both Neptune and Neo4j before deciding to go with Neo4j.

I second Michael's endorsement of Cypher over Gremlin (Neptune's property graph query language). I find Cypher much easier to work with.

Neo4js libraries APOC and GraphAlgorithms allow you to do things in one call that I struggled to do in a full weekend of code writing on Neptune.

If you go with Neptune, you are locked in to AWS. With Neo4j, you can move on prem or switch cloud providers when you want to.

There are many more books, resources, and tutorials on Neo4j than there are Neptune. I attended the Re:Invent conference, and there were very few sessions that went into any depth on Neptune. Graph doesn't seem like a high priority for AWS.

If you are an AWS customer, it's very easy to spin up a Neo4j AMI on EC2. It's faster than starting a Neptune cluster on AWS.

7 Likes

I'm evaluating both options too. Why did you say "If you go with Neptune, you are locked in to AWS"?. According to their documentation, they support Apache TinkerPop Gremlin and W3C SPARQL, which are no vendor proprietary languages.

I also have a question. I see Neo4j can run on EC2, but what about docker? I see there is an image Neo4j with Docker - Developer Guides, and it seems to me a better option than EC2. What do you think about it?

To answer the "Lock In" question, even though the query language, Apache TinkerPop Germlin or W3C SPARQL are not vendor specific, the DB engine still is. You can't take your Neptune DB and go run it in Google Cloud or Azure. Neo4j can be ran any where. That's the vendor lock in is the DB. If you run on Neptune you're locked in and if you wanted to go to Google, you're going to have to do a DB migration to a different technology.

Neo4j on Docker vs. EC2. - in my option that's just there to help whatever your DevOps release workflow is. There shouldn't be any performance differences or anything different about the application. Do you prefer to manage an application on a more bare metal scenario (EC2) or would you rather deploy a docker image?

1 Like

Thanks for your answer Mike, that is a really good point. To be honest, I really like Neo4j, because I like cypher syntax and other points, such as visualization tools, community support, etc.

However, we don't want to deal with data management, backups, auto-scaling, etc. That is one of the reasons why we are moving to AWS.

That's the argument that the people trying to convince me that moving to AWS is better than Neo4j, they say it's worth not having to worry about backups and auto scaling. Luckily we chose to stay on Neo4j. I discount the "management" argument.

It's been my experience, getting back ups scheduled and running is one-time task. Once you have them being taken it's almost zero maintenance.

Auto-scaling also can be solved with just a little more DevOps work. With an enterprise license you can run a cluster and scale horizontally. EC2 instances also auto-scale. Adobe is running their Behance product on Neo4j and they have tens of thousands if not hundred of thousands of users all stored in a 3-node Neo4j cluster. This is where I'd ask how volatile is the traffic is your traffic to the database and are you're seeing huge fluctuations in your traffic patterns that a severless service is warranted.

Since my original post I started doing more research to find performance benchmarks and user experience. I couldn't find anyone who has had a positive experience using Neptune. I found people who did benchmarks and found that Neptune would even crash when their graph got to a certain size and the query tried to do more than N-number of recursive traversals. Unfortunately I don't have of those links saved of the performance but I read enough to convince me that no matter of "reduce work of backups" am I going to sacrifice actual query performance and have my application randomly crash. You'll easily eat up development time trying to debug random crashes than spending a day in configuring a backup schedule and EC2 autoscaling policies. Neptune has also only had 4 software releases, v1.0.1.0, v1.0.2.0, v1.0.2.1, v1.0.2.2 which doesn't show to me that AWS really cares about this DB technology. I think they found an open source engine that they could tweak and rebrand to check off the box that they have a graph DB but I don't think they're committed to their DB. I think they're probably waiting to see if graph DB rise in popularity before committing developer resources or they already know they can't compete with Neo4j so they're going to let their DB flounder for anyone who doesn't want to use it. Obviously that is speculation and opinion but it's worth consideration before choosing which technology your business is going to invest their strategic resources in using.

Neo4j has also since come out with their Aura service which takes care of the DBA argument that people use.

AWS Neptune is also a different graph storage model. It is a Triple Store vs Neo4j being a Label Property Graph. Say good-bye to property and labels that we all enjoy in Neo4j and be prepared for many many many more relationships to nodes to store the properties that used to be on a single node.

As far as query language Cypher, not only is the easier query language, it's also the foundation of GQL, the language that the W3C world committee for standards has adopted for Graph Databases. Which doesn't bode well for TinkerPop or Gremlin for staying around or for any developers trying to find help and support.

This ended up being longer of a post than what my intentions were but I spent a lot of time researching if AWS Neptune is worth consideration and my personal bias wasn't clouding my judgement. It's now been my experience that anyone who asks me why not AWS Neptune, they haven't actually used a graph database and they just read the marketing pitch from AWS that it's "zero DBA work". But they haven't actually considered what it means to use Neptune, they haven't used themselves, they haven't supported it in a production scenario, their opinion is completely from reading marketing material without any first-hand experience. I think Neo4j is strides ahead in performance, functionality (i.e. APOC library), AI/M&L, and a vibrant community to support you if you need help. Those trump backups and auto-scaling.

4 Likes

Thanks for your answer Mike, I really appreciate the time to took to write such a detailed opinion. We are evaluating the two system at the moment, so all the information we can get to make the right decision is welcome.

1 Like

In Neptune, you don't need to create custom indices, isn't it? Also since it is aws manage and integrated with multiple aws services like cloudwatch, monitoring can be done easily. Just a few positives.

1 Like