Troubleshooting Connection Issues to Neo4j

cypher-shell
browser
connection
knowledge-base

(M. David Allen) #1

Troubleshooting Connection Issues to Neo4j (including Browser, Cypher Shell, and Driver Applications)

This post describes common issues users may encounter in connecting Neo4j Browser or cypher-shell to a Neo4j database, and how to address them.

Connection Timeout

Symptom: connection attempts lag for a long time, and then fail with connection timed out errors.

Example:

$ cypher-shell -a 37.204.217.197 -u neo4j -p myPassword
connection timed out: /37.204.217.197:7687

Troubleshooting steps:

  1. Ensure that the address is correct.
  2. Ensure that if the server is listening for bolt connections on a port other than 7687, that you pass the port explicitly to your client (e.g. cypher-shell) or other program you have written.
  3. Ensure that firewall rules do not prohibit traffic on the bolt port.

Common causes of this error:

  1. A cloud instances of neo4j is launched with no security groups defined or port access. Bolt is available on
    at the right address, but firewall rules prevent access. Packets are dropped, and so the result is a connection timeout.
  2. Non-standard configuration of neo4j which runs bolt on a port other than 7687, for example to comply with local network policies.
  3. The server is not yet available. For a period of time while starting up, and particularly if the database is repairing files or migrating an old store, the bolt endpoint may not be available. You will know that it is available when the logs contain a message that looks like this: 2018-05-25 13:34:34.584+0000 INFO Bolt enabled on 127.0.0.1:7687.

ServiceUnavailable: WebSocket connection failure

A similar message you might see is: WebSocket connection failure. Due to security constraints in your web browser, the reason for the failure is not available to this Neo4j Driver.

Symptom: you can connect to Neo4j Browser and enter credentials, but fail to connect with a message
about WebSocket connection failures.

It looks like this: https://imgur.com/3Y7NBDg.png

Explanation: this is commonly seen with Firefox and some versions of Internet Explorer, when Neo4j Browser
is used with an untrusted SSL certificate. When users click to accept the exception and permit traffic, those
browsers authorize that action for only the port that Neo4j Browser is running on, not for all ports on that
host. As a result, the browser's security policy fails the WebSocket connection to the bolt port.

Available Resolutions:

  1. Use a signed SSL certificate (follow these directions to generate certificates)
  2. Follow directions for your browser to trust the server's certificate for the bolt port, and then refresh the page.
  3. Use Chrome
  4. Set dbms.connector.bolt.tls_level=OPTIONAL in your neo4j config. Be aware that bolt connections may not be
    encrypted, but this is a method of side-stepping web browser issues with the untrusted certificate.

If using a signed SSL certificate is not an option for you, you must configure your browser to trust the unsigned certificate both on port 7473 (HTTPS) and 7687 (bolt). Configuring trust just for HTTPS is insufficient for browsers that enforce trust per-port, instead of per-host (such as Firefox). Consult the help documentation for your browser to determine how to do this, as it varies depending on your browser and operating system.

Failed to Establish Connection in (5000)ms.

When a driver attempts to connect to the server, it has a default amount of time that it will wait for a response from the server before giving up. When you get this message, it generally means that you did make a connection to the server, but the server isn't responsive within that timeout window. It may not be 5000ms, this is a configurable driver setting, and will depend on which language driver you're using, and your local configuration.

A common reason why this error occurs is that your Neo4j instance is under heavy load. For example if you're running a query that is soon going to result in an Out of Memory error, it would be possible to run into this error. Another possibility is extremely high network latency between your machine and the Neo4j instance, for example if you're on a low quality wifi link.


New AMIs for Neo4j 3.4.7 (with APOC & Graph Algos) available on AWS
(Nigel Small) #2

I'd highlight several other things here too.

Firstly, check and double check that you have the right port numbers as well as the right configuration. Port 7687 is the default port for Bolt (both secure and insecure) and 7474 is the default port for (insecure) HTTP. Secure HTTP defaults to port 7473. I've seen these mixed up multiple times, and have also seen attempts of secure connections to insecure servers and vice versa.

Use telnet or nc from the command line to test if a port is open and listening. These should both hold the connection open if one can be established, and return immediately if not.

$ nc localhost 7687

You can also use curl to check for HTTP:

$ curl localhost:7474
{
  "management" : "http://localhost:7474/db/manage/",
  "data" : "http://localhost:7474/db/data/",
  "bolt" : "bolt://localhost:7687"
}

Look for potential IPv4 vs IPv6 issues. If you're using localhost, this can resolve differently based on your operating system. Trying to connect to 127.0.0.1:7687 or [::1]:7687 explicitly can help diagnose name resolution issues with localhost:7687. If your network is IPv4 only, your neo4j.conf file should contain 0.0.0.0 for your dbms.connectors.default_listen_address. If you want to listen out for both IPv4 and IPv6 connections this will need to be set to :: (double colon) instead.

Lastly, multiple layers of network hardware can have complex interactions, particularly around timeouts. If you're seeing dropped connections, could there be something in your network that kills these if it thinks they're idle? Does it work locally, but not on AWS? Amazon has some quite aggressive connection-killing rules by default and it also ignores the TCP_KEEPALIVE setting that we enable.