Troubleshooting Connection Issues to Neo4j

cypher-shell
browser
connection
knowledge-base
(M. David Allen) #1

Troubleshooting Connection Issues to Neo4j (including Browser, Cypher Shell, and Driver Applications)

This post describes common issues users may encounter in connecting Neo4j Browser or cypher-shell to a Neo4j database, and how to address them.

Connection Timeout

Symptom: connection attempts lag for a long time, and then fail with connection timed out errors.

Example:

$ cypher-shell -a 37.204.217.197 -u neo4j -p myPassword
connection timed out: /37.204.217.197:7687

Troubleshooting steps:

  1. Ensure that the address is correct.
  2. Ensure that if the server is listening for bolt connections on a port other than 7687, that you pass the port explicitly to your client (e.g. cypher-shell) or other program you have written.
  3. Ensure that firewall rules do not prohibit traffic on the bolt port.

Common causes of this error:

  1. A cloud instances of neo4j is launched with no security groups defined or port access. Bolt is available on
    at the right address, but firewall rules prevent access. Packets are dropped, and so the result is a connection timeout.
  2. Non-standard configuration of neo4j which runs bolt on a port other than 7687, for example to comply with local network policies.
  3. The server is not yet available. For a period of time while starting up, and particularly if the database is repairing files or migrating an old store, the bolt endpoint may not be available. You will know that it is available when the logs contain a message that looks like this: 2018-05-25 13:34:34.584+0000 INFO Bolt enabled on 127.0.0.1:7687.

ServiceUnavailable: WebSocket connection failure

A similar message you might see is: WebSocket connection failure. Due to security constraints in your web browser, the reason for the failure is not available to this Neo4j Driver.

Symptom: you can connect to Neo4j Browser and enter credentials, but fail to connect with a message
about WebSocket connection failures.

It looks like this: https://imgur.com/3Y7NBDg.png

Explanation: this is commonly seen with Firefox and some versions of Internet Explorer, when Neo4j Browser
is used with an untrusted SSL certificate. When users click to accept the exception and permit traffic, those
browsers authorize that action for only the port that Neo4j Browser is running on, not for all ports on that
host. As a result, the browser's security policy fails the WebSocket connection to the bolt port.

Available Resolutions:

  1. Use a signed SSL certificate (follow these directions to generate certificates)
  2. Follow directions for your browser to trust the server's certificate for the bolt port, and then refresh the page.
  3. Use Chrome
  4. Set dbms.connector.bolt.tls_level=OPTIONAL in your neo4j config. Be aware that bolt connections may not be
    encrypted, but this is a method of side-stepping web browser issues with the untrusted certificate.
  5. If you are using Neo4j 3.5.0 specifically (and only that version), this can be caused by a bug and this work-around is available.

If using a signed SSL certificate is not an option for you, you must configure your browser to trust the unsigned certificate both on port 7473 (HTTPS) and 7687 (bolt). Configuring trust just for HTTPS is insufficient for browsers that enforce trust per-port, instead of per-host (such as Firefox). Consult the help documentation for your browser to determine how to do this, as it varies depending on your browser and operating system.

Failed to Establish Connection in (5000)ms.

When a driver attempts to connect to the server, it has a default amount of time that it will wait for a response from the server before giving up. When you get this message, it generally means that you did make a connection to the server, but the server isn't responsive within that timeout window. It may not be 5000ms, this is a configurable driver setting, and will depend on which language driver you're using, and your local configuration.

A common reason why this error occurs is that your Neo4j instance is under heavy load. For example if you're running a query that is soon going to result in an Out of Memory error, it would be possible to run into this error. Another possibility is extremely high network latency between your machine and the Neo4j instance, for example if you're on a low quality wifi link.

Bolt - Websocket Error
Amazon Community AMI: Service Unavailable: Websocket Connection Failure
Monitoring tool for neo4j community
How to fix [ServiceUnavailable: WebSocket connection failure] error on neo4j-enterprise AMI deployed on EC2?
EC2 Instance - Remote Console issue
Recommended memory config for importing 10GB dataset with 16GB RAM
Neo4j Web Socket Error
New AMIs for Neo4j 3.4.7 (with APOC & Graph Algos) available on AWS
(Nigel Small) #2

I'd highlight several other things here too.

Firstly, check and double check that you have the right port numbers as well as the right configuration. Port 7687 is the default port for Bolt (both secure and insecure) and 7474 is the default port for (insecure) HTTP. Secure HTTP defaults to port 7473. I've seen these mixed up multiple times, and have also seen attempts of secure connections to insecure servers and vice versa.

Use telnet or nc from the command line to test if a port is open and listening. These should both hold the connection open if one can be established, and return immediately if not.

$ nc localhost 7687

You can also use curl to check for HTTP:

$ curl localhost:7474
{
  "management" : "http://localhost:7474/db/manage/",
  "data" : "http://localhost:7474/db/data/",
  "bolt" : "bolt://localhost:7687"
}

Look for potential IPv4 vs IPv6 issues. If you're using localhost, this can resolve differently based on your operating system. Trying to connect to 127.0.0.1:7687 or [::1]:7687 explicitly can help diagnose name resolution issues with localhost:7687. If your network is IPv4 only, your neo4j.conf file should contain 0.0.0.0 for your dbms.connectors.default_listen_address. If you want to listen out for both IPv4 and IPv6 connections this will need to be set to :: (double colon) instead.

Lastly, multiple layers of network hardware can have complex interactions, particularly around timeouts. If you're seeing dropped connections, could there be something in your network that kills these if it thinks they're idle? Does it work locally, but not on AWS? Amazon has some quite aggressive connection-killing rules by default and it also ignores the TCP_KEEPALIVE setting that we enable.

1 Like
(Njho) #3

@david.allen

Currently trying to launch the Community AMI with Neo4j browser v3.2.10 & Neo4j 3.4.9 on Google Chrome.

Are the solutions to ServiceUnavailable: WebSocket connection failure still working?
Is using an SSL certificate the only way to get going using the Community AMI?

Solutions Tried:

  1. Follow directions for your browser to trust the server's certificate for the bolt port, and then refresh the page.
    Does not work. http://ec2-54-xxx-xxx-111.compute-1.amazonaws.com:7687/ on Chrome returns: not a WebSocket handshake request: missing upgrade, and there is not an option to trust this certificate.
  2. Use Google Chrome
    Currently using Google Chrome - Does not work :frowning:
  3. Set dbms.connector.bolt.tls_level=OPTIONAL in your neo4j config.
    Does this work? Updating this and, dbms.connector.bolt.address=0.0.0.0:7687 did not seem to correct the issue after verifying that the neo4j.config file had changed.

Curious what the best option is to get going! Thanks!

(M. David Allen) #4

Using an SSL certificate is a good option, but the other option you have is to open traffic on port 7474 and use regular HTTP instead of HTTPS. We enable HTTPS by default and not HTTP because of Amazon security requirements. If you are OK with passing details back and forth without encryption, then HTTP can be used without an SSL cert.

(Socratic) #5

Similar to njho's message above, I am using Neo4j Graph Database - Community Edition from the Amazon Marketplace to set up an AMI.

When I try to use regular HTTP and port 7474 in Google Chrome, I get the "ServiceUnavailable: WebSocket connection failure" error, even though the command nc localhost 7687 successfully holds the port open.

I am using Google Chrome and I set dbms.connector.bolt.tls_level=OPTIONAL, but that doesn't help. And I can't figure out how to tell Chrome to "trust the server's certificate for the bolt port". This AMI uses Neo4j 3.5.1 and not Neo4j 3.5.0.

So the only remaining option is to set up a certificate, but I'm having trouble using the instructions at Getting Certificates for Neo4j with LetsEncrypt. The EC2 Management Console's 'Description' tab has an entry for the "Public DNS (IPv4)" for the instance. I tried to use this entry to satisfy the must have a valid DNS address, but it causes LetsEncrypt to fail. And I was unable to understand how to use Google Domains and Google Cloud DNS.

Any advice would be appreciated. Also, I wonder why Amazon can't make it easy to create AMIs that have a certificate. You have said that "without knowing your local DNS configuration, the cloud image can't do the certificate bits for you." Doesn't the EC2 Management Console know the "local DNS configuration"? Shouldn't I be able to ssh into an instance and run a single command to set up the certificate?

(M. David Allen) #6

Hi @socratic several things are going on here.

Unfortunately, SSL certs are fundamentally tied to domain names, and you can't get one for a bare IP address. That isn't a neo4j policy, just the rules of how SSL works. Now, the unfortunate interaction here is that the Neo4j browser app requires bolt connectivity, which interacts with the browser's security model, which is what's putting you in the situation where you need the cert. Sorry that this is a pain but there are elements of dynamic cloud setup & the browser security model here that are not really in Neo4j's control.

We can't automatically set up the certificate for you for several reasons. On AWS, not everybody gets a DNS name in the first place. When you do automatically get a DNS name, it's typically mapped to your IP address (the address ends up being something like ec2-X-Y-Z-A where X-Y-Z-A is your IP address). SSL certs are bound to specific host names. So issuing a cert for a hostname like this would not be a good idea, because it would initially work, but if you stopped / restarted your VM, you could end up getting a different hostname, which would effectively break your certificate. Suppose we allowed you to enter your own custom domain name in, for example with a CloudFormation template. Even in this case if the SSL auto-setup would work for you, you would have had to pre-allocate that IP -> DNS name so that it would resolve the test probe that LetsEncrypt needs to do. A bit more on that...

On the problems you're having with the certificate instructions -- I don't think we can help without knowing what exactly the LetsEncrypt failure is. That said, the most common failure by far is either the wrong host name given, or the probe port not being open. Again this is LetsEncrypt stuff not Neo4j, but to issue you a cert they require that you "prove" you have the domain. So you give in "mydomain.com" (or whatever) and their service tries to hit that address on the probe port. If that probe port is firewalled off (which it would be in the standard Neo4j deploy, which doesn't need that port) then your setup could fail. Something to double check.

When you set dbms.connector.bolt.tls_level=OPTIONAL this permits clients to connect unencrypted, but won't help you, if for example, Chrome insists that your connection be encrypted as part of its own security model.

As stated above if you want encrypted communications, Neo4j ships with an untrusted cert out of the box (it can't be trusted because it can't know what DNS name you'll have, and hence can't be signed by a CA). So you can either "trust the untrusted cert", or you can generate a trusted cert.

If you're launching an AMI on AWS, you don't need to do anything with google domains and google cloud DNS. Rather if you're on Amazon, you'd be using their Route 53 service to register a domain, and their Certificate Manager tool. On AWS you could generate the cert on your own (without letsencrypt) and end up with the same files you need to do the SSL setup.

As for why you get "ServiceUnavailable" when you use port 7474 -- I just don't know, we'd need more information. The availability of port 7687 is one thing, but then tls_level is another, and firewalling configuration is a third.

I hope this helps.

(M. David Allen) #7

For those who want to get a valid SSL certificate for Neo4j, please consult this article:

(Shahriar Fakher) #8

thank you very much for your support. Now I have my server up and running with a valid ssl certificate but somehow I am not able to connect the database. This is the error:

WebSocket connection to 'wss://xxxxxxxxx.xx:7687/' failed: Error in connection establishment: net::ERR_ADDRESS_UNREACHABLE

Have you some hints for me how to solve this issue?

Thank you very much

P.S. After some changes in my docker container I see the following error in the logs

ERROR The RuntimeException could not be mapped to a response, re-throwing to the HTTP container Unable to construct bolt discoverable URI using 'null' as hostname: Expected scheme-specific part at index 5: bolt:

This is the first time for me setting up Neo4J on a cloud platform, so at the moment I've got no clue what is going wrong :thinking:

(M. David Allen) #9

Can you give me some more information? What tool gives you this error and when do you see it? The error literally means that the address is unreachable, meaning you've provided some IP address probably where it can't get packets to/from this. So double check the address you enter, and check the firewall rules with your cloud provider to verify that the firewall lets the network traffic through.

Please say more about what your docker container is, otherwise it's hard to tell

(Shahriar Fakher) #10

Hi David,

I’m running
Neo4j (3.5.3 Enterprise) on a Jelastic PaaS in a docker container along with a
NGINX load balancer.

While connecting to the Neo4J browser, I get
the mentioned error and I’m not able to connect the server

As far as I can
see the server seems to work fine

Do you need
more information?

Thank you
very much

(M. David Allen) #11

The load balancer may be the issue here. Can you share its configuration? We typically don't use load balancers in front of Neo4j because of the way the routing protocol works. If it's single instance you dont' need a load balancer, and if it's cluster, then load balancers often interfere.

Also, this error is different than the one you reported above. Can't connect and address unroutable are two different problems. When these things occur, it is most likely the case that port 7687 is not getting to Neo4j through some layer of your network config. I'm not sure I can help with this because it depends on how you've configured Jelastic, the load balancer, and the firewall. But this is where I would look.

(Shahriar Fakher) #12

O.K. David, thank you very much indeed. I'll set up a new environment without the loadbalancer and check how it works.

Thanks for you support

(Nithin34 It) #13

Hi,
Issue can resolved by doing 2 steps.

  1. Add route 53 in AWS of Neo4j instance with ip address - copy that hostname.
  2. Edit the neo4j.config ... Added one more line item as
    dbms.connector.bolt.advertised_address=<host_name>:7687
  3. Stop the DB and restart again. it will automatically appear bolt connection details, then login using credentials. You will able to access DB.

Thanks,
Nithin.

(Shahriar Fakher) #14

Finally I've got the solution :laughing: Maybe it was too simple to be mentioned somewhere, but for me the key: I opened the address "https://domain:7687" and there I added the certificate once again manually and voilà! Now everything is working fine.
I hope this would also help others not to struggle so long with this issue.

1 Like