Troubleshooting Connection Issues to Neo4j

Troubleshooting Connection Issues to Neo4j (including Browser, Cypher Shell, and Driver Applications)

This post describes common issues users may encounter in connecting Neo4j Browser or cypher-shell to a Neo4j database, and how to address them.

Connection Timeout

Symptom: connection attempts lag for a long time, and then fail with connection timed out errors.

Example:

$ cypher-shell -a 37.204.217.197 -u neo4j -p myPassword
connection timed out: /37.204.217.197:7687

Troubleshooting steps:

  1. Ensure that the address is correct.
  2. Ensure that if the server is listening for bolt connections on a port other than 7687, that you pass the port explicitly to your client (e.g. cypher-shell) or other program you have written.
  3. Ensure that firewall rules do not prohibit traffic on the bolt port.

Common causes of this error:

  1. A cloud instances of neo4j is launched with no security groups defined or port access. Bolt is available on
    at the right address, but firewall rules prevent access. Packets are dropped, and so the result is a connection timeout.
  2. Non-standard configuration of neo4j which runs bolt on a port other than 7687, for example to comply with local network policies.
  3. The server is not yet available. For a period of time while starting up, and particularly if the database is repairing files or migrating an old store, the bolt endpoint may not be available. You will know that it is available when the logs contain a message that looks like this: 2018-05-25 13:34:34.584+0000 INFO Bolt enabled on 127.0.0.1:7687.

ServiceUnavailable: WebSocket connection failure

A similar message you might see is: WebSocket connection failure. Due to security constraints in your web browser, the reason for the failure is not available to this Neo4j Driver.

Symptom: you can connect to Neo4j Browser and enter credentials, but fail to connect with a message
about WebSocket connection failures.

It looks like this: https://imgur.com/3Y7NBDg.png

Explanation: this is commonly seen with Firefox and some versions of Internet Explorer, when Neo4j Browser
is used with an untrusted SSL certificate. When users click to accept the exception and permit traffic, those
browsers authorize that action for only the port that Neo4j Browser is running on, not for all ports on that
host. As a result, the browser's security policy fails the WebSocket connection to the bolt port.

Available Resolutions:

  1. Use a signed SSL certificate (follow these directions to generate certificates)
  2. Follow directions for your browser to trust the server's certificate for the bolt port, and then refresh the page. (In chrome, this can be forced by going to https://your-host:7687 and accepting the cert, even though this bolt port isn't really https)
  3. Use Chrome
  4. Set dbms.connector.bolt.tls_level=OPTIONAL in your neo4j config. Be aware that bolt connections may not be
    encrypted, but this is a method of side-stepping web browser issues with the untrusted certificate.
  5. If you are using Neo4j 3.5.0 specifically (and only that version), this can be caused by a bug and this work-around is available.

If you are using Neo4j 4.0 be aware that some defaults have changed. We also recommend these settings for self-signed certificates in 4.0:

     dbms.ssl.policy.bolt.client_auth=NONE
     dbms.ssl.policy.https.client_auth=NONE

If using a signed SSL certificate is not an option for you, you must configure your browser to trust the unsigned certificate both on port 7473 (HTTPS) and 7687 (bolt). Configuring trust just for HTTPS is insufficient for browsers that enforce trust per-port, instead of per-host (such as Firefox). Consult the help documentation for your browser to determine how to do this, as it varies depending on your browser and operating system.

Failed to Establish Connection in (5000)ms.

When a driver attempts to connect to the server, it has a default amount of time that it will wait for a response from the server before giving up. When you get this message, it generally means that you did make a connection to the server, but the server isn't responsive within that timeout window. It may not be 5000ms, this is a configurable driver setting, and will depend on which language driver you're using, and your local configuration.

A common reason why this error occurs is that your Neo4j instance is under heavy load. For example if you're running a query that is soon going to result in an Out of Memory error, it would be possible to run into this error. Another possibility is extremely high network latency between your machine and the Neo4j instance, for example if you're on a low quality wifi link.

I'd highlight several other things here too.

Firstly, check and double check that you have the right port numbers as well as the right configuration. Port 7687 is the default port for Bolt (both secure and insecure) and 7474 is the default port for (insecure) HTTP. Secure HTTP defaults to port 7473. I've seen these mixed up multiple times, and have also seen attempts of secure connections to insecure servers and vice versa.

Use telnet or nc from the command line to test if a port is open and listening. These should both hold the connection open if one can be established, and return immediately if not.

$ nc localhost 7687

You can also use curl to check for HTTP:

$ curl localhost:7474
{
  "management" : "http://localhost:7474/db/manage/",
  "data" : "http://localhost:7474/db/data/",
  "bolt" : "bolt://localhost:7687"
}

Look for potential IPv4 vs IPv6 issues. If you're using localhost, this can resolve differently based on your operating system. Trying to connect to 127.0.0.1:7687 or [::1]:7687 explicitly can help diagnose name resolution issues with localhost:7687. If your network is IPv4 only, your neo4j.conf file should contain 0.0.0.0 for your dbms.connectors.default_listen_address. If you want to listen out for both IPv4 and IPv6 connections this will need to be set to :: (double colon) instead.

Lastly, multiple layers of network hardware can have complex interactions, particularly around timeouts. If you're seeing dropped connections, could there be something in your network that kills these if it thinks they're idle? Does it work locally, but not on AWS? Amazon has some quite aggressive connection-killing rules by default and it also ignores the TCP_KEEPALIVE setting that we enable.

2 Likes

@david_allen

Currently trying to launch the Community AMI with Neo4j browser v3.2.10 & Neo4j 3.4.9 on Google Chrome.

Are the solutions to ServiceUnavailable: WebSocket connection failure still working?
Is using an SSL certificate the only way to get going using the Community AMI?

Solutions Tried:

  1. Follow directions for your browser to trust the server's certificate for the bolt port, and then refresh the page.
    Does not work. http://ec2-54-xxx-xxx-111.compute-1.amazonaws.com:7687/ on Chrome returns: not a WebSocket handshake request: missing upgrade, and there is not an option to trust this certificate.
  2. Use Google Chrome
    Currently using Google Chrome - Does not work :frowning:
  3. Set dbms.connector.bolt.tls_level=OPTIONAL in your neo4j config.
    Does this work? Updating this and, dbms.connector.bolt.address=0.0.0.0:7687 did not seem to correct the issue after verifying that the neo4j.config file had changed.

Curious what the best option is to get going! Thanks!

Using an SSL certificate is a good option, but the other option you have is to open traffic on port 7474 and use regular HTTP instead of HTTPS. We enable HTTPS by default and not HTTP because of Amazon security requirements. If you are OK with passing details back and forth without encryption, then HTTP can be used without an SSL cert.

Similar to njho's message above, I am using Neo4j Graph Database - Community Edition from the Amazon Marketplace to set up an AMI.

When I try to use regular HTTP and port 7474 in Google Chrome, I get the "ServiceUnavailable: WebSocket connection failure" error, even though the command nc localhost 7687 successfully holds the port open.

I am using Google Chrome and I set dbms.connector.bolt.tls_level=OPTIONAL, but that doesn't help. And I can't figure out how to tell Chrome to "trust the server's certificate for the bolt port". This AMI uses Neo4j 3.5.1 and not Neo4j 3.5.0.

So the only remaining option is to set up a certificate, but I'm having trouble using the instructions at Getting Certificates for Neo4j with LetsEncrypt. The EC2 Management Console's 'Description' tab has an entry for the "Public DNS (IPv4)" for the instance. I tried to use this entry to satisfy the must have a valid DNS address, but it causes LetsEncrypt to fail. And I was unable to understand how to use Google Domains and Google Cloud DNS.

Any advice would be appreciated. Also, I wonder why Amazon can't make it easy to create AMIs that have a certificate. You have said that "without knowing your local DNS configuration, the cloud image can't do the certificate bits for you." Doesn't the EC2 Management Console know the "local DNS configuration"? Shouldn't I be able to ssh into an instance and run a single command to set up the certificate?

Hi @socratic several things are going on here.

Unfortunately, SSL certs are fundamentally tied to domain names, and you can't get one for a bare IP address. That isn't a neo4j policy, just the rules of how SSL works. Now, the unfortunate interaction here is that the Neo4j browser app requires bolt connectivity, which interacts with the browser's security model, which is what's putting you in the situation where you need the cert. Sorry that this is a pain but there are elements of dynamic cloud setup & the browser security model here that are not really in Neo4j's control.

We can't automatically set up the certificate for you for several reasons. On AWS, not everybody gets a DNS name in the first place. When you do automatically get a DNS name, it's typically mapped to your IP address (the address ends up being something like ec2-X-Y-Z-A where X-Y-Z-A is your IP address). SSL certs are bound to specific host names. So issuing a cert for a hostname like this would not be a good idea, because it would initially work, but if you stopped / restarted your VM, you could end up getting a different hostname, which would effectively break your certificate. Suppose we allowed you to enter your own custom domain name in, for example with a CloudFormation template. Even in this case if the SSL auto-setup would work for you, you would have had to pre-allocate that IP -> DNS name so that it would resolve the test probe that LetsEncrypt needs to do. A bit more on that...

On the problems you're having with the certificate instructions -- I don't think we can help without knowing what exactly the LetsEncrypt failure is. That said, the most common failure by far is either the wrong host name given, or the probe port not being open. Again this is LetsEncrypt stuff not Neo4j, but to issue you a cert they require that you "prove" you have the domain. So you give in "mydomain.com" (or whatever) and their service tries to hit that address on the probe port. If that probe port is firewalled off (which it would be in the standard Neo4j deploy, which doesn't need that port) then your setup could fail. Something to double check.

When you set dbms.connector.bolt.tls_level=OPTIONAL this permits clients to connect unencrypted, but won't help you, if for example, Chrome insists that your connection be encrypted as part of its own security model.

As stated above if you want encrypted communications, Neo4j ships with an untrusted cert out of the box (it can't be trusted because it can't know what DNS name you'll have, and hence can't be signed by a CA). So you can either "trust the untrusted cert", or you can generate a trusted cert.

If you're launching an AMI on AWS, you don't need to do anything with google domains and google cloud DNS. Rather if you're on Amazon, you'd be using their Route 53 service to register a domain, and their Certificate Manager tool. On AWS you could generate the cert on your own (without letsencrypt) and end up with the same files you need to do the SSL setup.

As for why you get "ServiceUnavailable" when you use port 7474 -- I just don't know, we'd need more information. The availability of port 7687 is one thing, but then tls_level is another, and firewalling configuration is a third.

I hope this helps.

For those who want to get a valid SSL certificate for Neo4j, please consult this article:

3 Likes

thank you very much for your support. Now I have my server up and running with a valid ssl certificate but somehow I am not able to connect the database. This is the error:

WebSocket connection to 'wss://xxxxxxxxx.xx:7687/' failed: Error in connection establishment: net::ERR_ADDRESS_UNREACHABLE

Have you some hints for me how to solve this issue?

Thank you very much

P.S. After some changes in my docker container I see the following error in the logs

ERROR The RuntimeException could not be mapped to a response, re-throwing to the HTTP container Unable to construct bolt discoverable URI using 'null' as hostname: Expected scheme-specific part at index 5: bolt:

This is the first time for me setting up Neo4J on a cloud platform, so at the moment I've got no clue what is going wrong :thinking:

Can you give me some more information? What tool gives you this error and when do you see it? The error literally means that the address is unreachable, meaning you've provided some IP address probably where it can't get packets to/from this. So double check the address you enter, and check the firewall rules with your cloud provider to verify that the firewall lets the network traffic through.

Please say more about what your docker container is, otherwise it's hard to tell

Hi David,

I’m running
Neo4j (3.5.3 Enterprise) on a Jelastic PaaS in a docker container along with a
NGINX load balancer.

While connecting to the Neo4J browser, I get
the mentioned error and I’m not able to connect the server

As far as I can
see the server seems to work fine

Do you need
more information?

Thank you
very much

The load balancer may be the issue here. Can you share its configuration? We typically don't use load balancers in front of Neo4j because of the way the routing protocol works. If it's single instance you dont' need a load balancer, and if it's cluster, then load balancers often interfere.

Also, this error is different than the one you reported above. Can't connect and address unroutable are two different problems. When these things occur, it is most likely the case that port 7687 is not getting to Neo4j through some layer of your network config. I'm not sure I can help with this because it depends on how you've configured Jelastic, the load balancer, and the firewall. But this is where I would look.

O.K. David, thank you very much indeed. I'll set up a new environment without the loadbalancer and check how it works.

Thanks for you support

Hi,
Issue can resolved by doing 2 steps.

  1. Add route 53 in AWS of Neo4j instance with ip address - copy that hostname.
  2. Edit the neo4j.config ... Added one more line item as
    dbms.connector.bolt.advertised_address=<host_name>:7687
  3. Stop the DB and restart again. it will automatically appear bolt connection details, then login using credentials. You will able to access DB.

Thanks,
Nithin.

Finally I've got the solution :laughing: Maybe it was too simple to be mentioned somewhere, but for me the key: I opened the address "https://domain:7687" and there I added the certificate once again manually and voilà! Now everything is working fine.
I hope this would also help others not to struggle so long with this issue.

1 Like

It surely is a matter of WebSocket timeout in my case, this WebSocket connection failure pops out only when I run a heavy query through the Neo4j Browser.

Here's the situation, my graph contains ~145000 nodes and ~152000 relationships, everything goes fine when I run a query through cypher-shell:

MATCH (a:Device {dID:"441076300000001025283863"}), (z:Device {dID:"441076300000001025271029"})
CALL apoc.path.expandConfig(a, {relationshipFilter:"CABLE_SEG|PIGTAIL", limit:500, terminatorNodes:[z]}) yield path as p
WHERE all(n IN relationships(p) where n.kx_count>0)
AND apoc.coll.containsAll([n IN nodes(p) | n.dID], ["441076300000001025283550"])
AND NOT ANY(n IN nodes(p) WHERE n.dID IN ["441076300000001025277680","441076300000001025277884"])
WITH p, apoc.coll.sum([x IN relationships(p) | x.stpCount]) as stpSum
RETURN p, stpSum

I checked the query plan it took ~100000ms to get 14 rows of result from my graph, it worked:

AND it also went fine when I lowered the parameter limit to 200 in the apoc.path.expandConfig() call on the Neo4j Browser:

So I believe it's a matter of timeout issue on the WebSocket, my concern is whether or not this issue will happen when I use the HTTP API to run a heavy query? what is the threshold? at where I can adjust it?

I get that all sorts of fancy security measures are needed while attempting to run neo4j on AWS or complex configurations with containers and all that.

Is it asking too much for the docs and distribution to provide a step-by-step recipe for a dirt-simple hello-world smoke test?

I'm trying to get neo4j running on a reasonably vanilla-flavored CentOS 7 guest VM running in a vmware hypervisor on a Windows 10 Pro host. The underlying Windows 10 system is more than adequate.

I want to access neo4j using a standard web browser (Chrome or Firefox) running on the host.

This configuration, other than neo4j, has been working just fine for a long time. The guest has an httpd instance on port 80 that is robust and reliable. I shell into the guest with no issues. I use various host-based IDEs that use various sockets to communicate with the guest and all is fine.

I can't make neo4j work. I've been able to get exactly ONE request to produce a login screen -- that subsequently failed because the WS connection to bolt doesn't work. That's it. I attempted the config file change to make that work, and everything stopped working. I reverted it, and still everything is broken.

I've used systemctl to show that the neo4j service is up and running (it is). I've used firewall-cmd to show that ports 7474 and 7687 are open. They are. I've used netstat to confirm that the open ports have listeners. They do. I've confirmed the selinux is disabled. And yet it does nothing.

When I use curl from a shell connected to the guest, it appears that something is on 7474:

$ curl 127.0.0.1:7474
{
  "bolt_routing" : "neo4j://127.0.0.1:7687",
  "transaction" : "http://127.0.0.1:7474/db/{databaseName}/tx",
  "bolt_direct" : "bolt://127.0.0.1:7687",
  "neo4j_version" : "4.0.0",
  "neo4j_edition" : "community"
}

I haven't found any useful logs -- the "debug.log" in /var/log/neo4j has lots of useless startup and shutdown information and nothing of interest. The journalctl command is similarly unhelpful:

$journalctl -e -u neo4j
~
~
~
-- Logs begin at Wed 2020-02-26 11:56:58 EST, end at Wed 2020-02-26 12:25:09 EST. --
Feb 26 11:57:24 localhost.localdomain systemd[1]: Started Neo4j Graph Database.
Feb 26 11:57:28 localhost.localdomain neo4j[1381]: Directories in use:
Feb 26 11:57:28 localhost.localdomain neo4j[1381]: home:         /var/lib/neo4j
Feb 26 11:57:28 localhost.localdomain neo4j[1381]: config:       /etc/neo4j
Feb 26 11:57:28 localhost.localdomain neo4j[1381]: logs:         /var/log/neo4j
Feb 26 11:57:28 localhost.localdomain neo4j[1381]: plugins:      /var/lib/neo4j/plugins
Feb 26 11:57:28 localhost.localdomain neo4j[1381]: import:       /var/lib/neo4j/import
Feb 26 11:57:28 localhost.localdomain neo4j[1381]: data:         /var/lib/neo4j/data
Feb 26 11:57:28 localhost.localdomain neo4j[1381]: certificates: /var/lib/neo4j/certificates
Feb 26 11:57:28 localhost.localdomain neo4j[1381]: run:          /var/run/neo4j
Feb 26 11:57:28 localhost.localdomain neo4j[1381]: Starting Neo4j.
Feb 26 11:57:38 localhost.localdomain neo4j[1381]: 2020-02-26 16:57:38.701+0000 WARN  Use of deprecated setting dbms.connectors.default_listen_address. It is replaced by dbms.default_listen_
Feb 26 11:57:38 localhost.localdomain neo4j[1381]: 2020-02-26 16:57:38.704+0000 WARN  Use of deprecated setting dbms.directories.certificates. Legacy ssl policy is no longer supported.
Feb 26 11:57:38 localhost.localdomain neo4j[1381]: 2020-02-26 16:57:38.786+0000 INFO  ======== Neo4j 4.0.0 ========
Feb 26 11:57:38 localhost.localdomain neo4j[1381]: 2020-02-26 16:57:38.791+0000 INFO  Starting...
Feb 26 11:58:05 localhost.localdomain neo4j[1381]: 2020-02-26 16:58:05.606+0000 INFO  Bolt enabled on 127.0.0.1:7687.
Feb 26 11:58:05 localhost.localdomain neo4j[1381]: 2020-02-26 16:58:05.607+0000 INFO  Started.
Feb 26 11:58:11 localhost.localdomain neo4j[1381]: 2020-02-26 16:58:11.455+0000 INFO  Remote interface available at http://localhost:7474/

Is it possible for somebody to provide a simple and straightforward recipe for performing a straightforward hello-world roundtrip to demonstrate that neo4j is installed and running on a local guest VM? If it can be done with Apache, the various IDEs, nodejs, react, angular, mongodb, and mysql, is it so very hard to do with neo4j?

@tms you're using 4.0 -- and I think you're running into an issue discussed / solved here: Cypher-shell certificates - #2 by david.allen

Is it easier to use an earlier version of neo4j?

I'm trying very hard to avoid getting sucked down the rathole of certs, domain names, and all that. I get that I need all that for a production server facing the jungle.

I'm running a guest VM on my own physical computer with several firewalls between it and the outside world. None of these communications require any complex authentication technology.

I appreciate the quick response, I'm not trying to bust your chops. I'm just looking for the quickest dirtiest path to doing some local experimentation with neo4j. I may even just punt this altogether and try connecting from running python code using the python drivers.

@tms the absolute quickest and dirtiest way to try Neo4j is neo4jsandbox.com, nothing to install or use.

After that, if you want to set it up yourself, I'd ask you to consider using one of the Cloud Marketplace launches. After that, if you don't want a cluster, you should consider something like this: How to Automate Neo4j Deploys on Google Cloud Platform (GCP) | by David Allen | Neo4j Developer Blog | Medium (there's an equivalent article for AWS and Azure if you're there). Most of those options (as of this writing) will be 3.5 series.

Setting up and configuring everything yourself, on a VM of your own creation, using 4.0.0 - your best option is to follow the guidance I linked above.

Ok, got it.

I note the following from an elevated command prompt on the Windows 10 pro host:

C:\WINDOWS\system32>curl http://192.168.242.128:80
<h1>Hello world</h1>

C:\WINDOWS\system32>curl http://192.168.242.128:7474
curl: (7) Failed to connect to 192.168.242.128 port 7474: Connection refused

The first shows that the host is able to communicate with the guest on port 80. The second shows (I think) that the guest won't respond on port 7474.

I guess this is consistent with the certificate issue you describe. I guess I'll make and install a self-signed certificate -- I've done that before, but in a different VM and environment.