A ton of (unpleasant) surprises that started a couple of hours ago

As I was sitting developing my web application that made regular requests to the GraphQL endpoint (powered by neo4j-graphql plugin), one of my requests failed to resolve. I tried again and again only to discover that I'm totally unable to access my Neo4j database anymore. Not through neo4j desktop browser, not through GraphiQL, not through HTTPS or bolt...

  1. Checked the load balancer... OK
  2. Checked the EC2 instance health, ping... OK
  3. Checked if I can SSH into EC2 and run sudo service neo4j status... OK

Strange...
Decided to run a quick apt-get upgrade, Rebooted. While it was upgrading I've noticed a very unexpected thing: Neo4j 4.0.0!!!
Stress started to build up. I looked on the official release notes page... 3.5.14 is currently the latest stable version. What gives? Whatever, the damage is done. Started the upgrade of graph.db. This went well according to the logs (although, had to change certain things in neo4j.conf according to complaints I saw in the log file. Moved APOC and GraphQL jars away from the plugins folder as it wouldn't start otherwise.)

Now I'm left with a bare bones Neo4j 4.0.0. Given the logs and the size of my graph.db, I haven't lost anything. Service status gives me encouraging feedback:

However, the server seems to be quite busy

I still can't connect to my database through neo4j browser...

Friends, I don't know what to do! I really need your advise here. Thank you for all the help in advance.

OK, I guess the EC2 instance load is due to the fact that Neo4j automatically started rebuilding indexes... Correct me if I'm wrong.

I'll wait until that's done and will reiterate my attempts but, please, do chip in! I'm sweating bullets here as I hit refresh with shaking hands...

To prevent yourself from unwanted upgrades do use apt pinning.

You basically have to options:

  1. Await index rebuild finished. We already have an apoc version playing nice with the version you have installed locally, see https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/tag/4.0.0-rc01. Don't know OOTB if there's already a release for 4.0 for the graphql plugin
  2. rollback to 3.5.x using apt pinning. Restore the dataset from a backup. Neo4j has no standard way to downgrade a datastore. With some custom code changes this can be done via https://github.com/jexp/store-utils if absolutely necessary.
1 Like

Hi @stefan.armbruster,

Thanks for a prompt reaction.
I'd rather try making it work with 4.0.0 as I only use relatively basic APOC procedures which, I'm sure will be playing nicely as you've mentioned.
As far as the GraphQL plugin is concerned, it wasn't really cutting it for me anyway so already have a node backend server written which uses neo4j-graphql-js with Apollo. I was postponing this migration for a while now but I guess feat decided otherwise. I'll try the plugin first though.

I had another question:
The logs keep printing the following message. Is this to be expected?

I saw a message running by "Bold server started on 0.0.0.0:7687" at some point but I still can't connect through Neo4j Browser both with and without load balancer.
The system is sill quite busy. I'll let it run for a while.

What do you think?

The log messages do not look suspicious at all.

As long as upgrade (aka index population etc) is in place you cannot connect via bolt. Can you please post your debug.log snippet covering latest start until now?

How large is your graph?

@stefan.armbruster, good to hear it doesn't look suspicious. Don't know how to thank you for all the help.
Here's a screenshot of graph.db/ so you can get an idea of the size:

And here's the debug.log
debug.log

Again, thanks for your help!

The log file indicates that you seem to be in a endless start/stop cycle. I don't have a reason on why this is happening.

I'd try this:

  1. manually stop the service: systemctl stop neo4j
  2. ensure the java process is stopped
  3. use sudo/su to become neo4j user and try to manually start it bypassing systemd. Tar.gz distributions do have a bin/neo4j script which you want to use. I suspect the debian package has this as well somewhere - maybe do a dpkg -L neo4j to get all files of the package.
  4. if you can start it this way, the culprit is the systemd script

I switched to neo4j user and located the bin file here /usr/bin/neo4j. Tried to run it but got permission denied as it's root's.


Could my infinite loop be just a matter of wrong permissions somewhere? This particular file and maybe something else too?

This page doesn't specify which folders/files exactly should be owned by whom.

I don't know by heart. Maybe you can spin up a second temporary instance, install neo4j 3.5.13 via apt and compare permissions?

@stefan.armbruster, I checked permissions on the freshly installed instance and they look just like the ones from my previous message (at least in the /usr/bin/neo4*). I tried to start using sudo /usr/bin/neo4j start while being neo4j user and couldn't. Here's a screenshot of the log:

Ok, I commented out HTTPS. I was now able to start my database properly and can now connect through bolt (but without encrypted connection though). That's already a step forward in diagnostics!

So, previously I was using Neo4j's legacy approach to SSL but, as I understand, it's no longer usable. I'm reading this page, and realise that I need to provide two pem files (cert/key). Fair enough. However, I'm using AWS's ELB (Elastic Load Balancer) in front of the instance. This ELB uses the ACM certificate generated by Amazon AWS itself. What's the SSL configuration in this case? Should I still generate another cert/key pair on the EC2 instance myself and specify it in neo4j.conf? I'm slightly lost...

A little update:

  1. I generated ssl cert/key pair on EC2
  2. Installed this version of Netty
  3. Specified the following configurations:
dbms.connector.bolt.enabled=true
dbms.connector.http.enabled=false
dbms.connector.https.enabled=true
bolt.ssl_policy=client_policy
https.ssl_policy=client_policy
dbms.netty.ssl.provider=OPENSSL
dbms.ssl.policy.bolt.base_directory=certificates/client_policy
dbms.ssl.policy.https.base_directory=certificates/client_policy

My certificates/client_policy folder structure follows the documentation:


and

When I was starting the server without

bolt.ssl_policy=client_policy
https.ssl_policy=client_policy

HTTPS policy was not recognised:

When I've added

bolt.ssl_policy=client_policy
https.ssl_policy=client_policy

back, I started getting the "depricated" warnings:

I know Neo4j 4.0.0 config file changed a bit. I addressed all startup comments that were complaining about deprecated options and substituted them with alternatives as per Neo4j's startup message suggestions.

I'm sure I'm almost there and it's just a matter of a couple of other flags to modify in the new version of neo4j.conf but what flags? I see no documentation about it. Maybe I should explicitly specify private_key public_certificate trusted_dir revoked_dir options that Neo4j normally determines automatically? I tried to install Neo4j 4.0.0 on a separate fresh EC2 instance to get some inspiration from there as @stefan.armbruster suggested but the latest version I can get from apt is now 3.5.14!

Any help would be highly appreciated. Thanks in advance.

Meanwhile we've removed the 4.0.0 version to avoid unwanted upgrade (as you suffered from) therefore you'd now get 3.5.14 as latest.

You should see the 4.0.0 default config file in /etc/neo4j/neo4j.conf.dpkg-dist (or a similar name). I'd just use the default distribution config file an adopt it as needed. There a 4.0 migration guide.