cancel
Showing results for 
Search instead for 
Did you mean: 

Troubleshooting Connection Issues to Neo4j

david_allen
Neo4j
Neo4j

Troubleshooting Connection Issues to Neo4j (including Browser, Cypher Shell, and Driver Applications)

This post describes common issues users may encounter in connecting Neo4j Browser or cypher-shell to a Neo4j database, and how to address them.

Connection Timeout

Symptom: connection attempts lag for a long time, and then fail with connection timed out errors.

Example:

$ cypher-shell -a 37.204.217.197 -u neo4j -p myPassword
connection timed out: /37.204.217.197:7687

Troubleshooting steps:

  1. Ensure that the address is correct.
  2. Ensure that if the server is listening for bolt connections on a port other than 7687, that you pass the port explicitly to your client (e.g. cypher-shell) or other program you have written.
  3. Ensure that firewall rules do not prohibit traffic on the bolt port.

Common causes of this error:

  1. A cloud instances of neo4j is launched with no security groups defined or port access. Bolt is available on
    at the right address, but firewall rules prevent access. Packets are dropped, and so the result is a connection timeout.
  2. Non-standard configuration of neo4j which runs bolt on a port other than 7687, for example to comply with local network policies.
  3. The server is not yet available. For a period of time while starting up, and particularly if the database is repairing files or migrating an old store, the bolt endpoint may not be available. You will know that it is available when the logs contain a message that looks like this: 2018-05-25 13:34:34.584+0000 INFO Bolt enabled on 127.0.0.1:7687.

ServiceUnavailable: WebSocket connection failure

A similar message you might see is: WebSocket connection failure. Due to security constraints in your web browser, the reason for the failure is not available to this Neo4j Driver.

Symptom: you can connect to Neo4j Browser and enter credentials, but fail to connect with a message
about WebSocket connection failures.

It looks like this: https://imgur.com/3Y7NBDg.png

Explanation: this is commonly seen with Firefox and some versions of Internet Explorer, when Neo4j Browser
is used with an untrusted SSL certificate. When users click to accept the exception and permit traffic, those
browsers authorize that action for only the port that Neo4j Browser is running on, not for all ports on that
host. As a result, the browser's security policy fails the WebSocket connection to the bolt port.

Available Resolutions:

  1. Use a signed SSL certificate (follow these directions to generate certificates)
  2. Follow directions for your browser to trust the server's certificate for the bolt port, and then refresh the page. (In chrome, this can be forced by going to https://your-host:7687 and accepting the cert, even though this bolt port isn't really https)
  3. Use Chrome
  4. Set dbms.connector.bolt.tls_level=OPTIONAL in your neo4j config. Be aware that bolt connections may not be
    encrypted, but this is a method of side-stepping web browser issues with the untrusted certificate.
  5. If you are using Neo4j 3.5.0 specifically (and only that version), this can be caused by a bug and this work-around is available.

If you are using Neo4j 4.0 be aware that some defaults have changed. We also recommend these settings for self-signed certificates in 4.0:

     dbms.ssl.policy.bolt.client_auth=NONE
     dbms.ssl.policy.https.client_auth=NONE

If using a signed SSL certificate is not an option for you, you must configure your browser to trust the unsigned certificate both on port 7473 (HTTPS) and 7687 (bolt). Configuring trust just for HTTPS is insufficient for browsers that enforce trust per-port, instead of per-host (such as Firefox). Consult the help documentation for your browser to determine how to do this, as it varies depending on your browser and operating system.

Failed to Establish Connection in (5000)ms.

When a driver attempts to connect to the server, it has a default amount of time that it will wait for a response from the server before giving up. When you get this message, it generally means that you did make a connection to the server, but the server isn't responsive within that timeout window. It may not be 5000ms, this is a configurable driver setting, and will depend on which language driver you're using, and your local configuration.

A common reason why this error occurs is that your Neo4j instance is under heavy load. For example if you're running a query that is soon going to result in an Out of Memory error, it would be possible to run into this error. Another possibility is extremely high network latency between your machine and the Neo4j instance, for example if you're on a low quality wifi link.

47 REPLIES 47

technige
Graph Buddy

I'd highlight several other things here too.

Firstly, check and double check that you have the right port numbers as well as the right configuration. Port 7687 is the default port for Bolt (both secure and insecure) and 7474 is the default port for (insecure) HTTP. Secure HTTP defaults to port 7473. I've seen these mixed up multiple times, and have also seen attempts of secure connections to insecure servers and vice versa.

Use telnet or nc from the command line to test if a port is open and listening. These should both hold the connection open if one can be established, and return immediately if not.

$ nc localhost 7687

You can also use curl to check for HTTP:

$ curl localhost:7474
{
  "management" : "http://localhost:7474/db/manage/",
  "data" : "http://localhost:7474/db/data/",
  "bolt" : "bolt://localhost:7687"
}

Look for potential IPv4 vs IPv6 issues. If you're using localhost, this can resolve differently based on your operating system. Trying to connect to 127.0.0.1:7687 or [::1]:7687 explicitly can help diagnose name resolution issues with localhost:7687. If your network is IPv4 only, your neo4j.conf file should contain 0.0.0.0 for your dbms.connectors.default_listen_address. If you want to listen out for both IPv4 and IPv6 connections this will need to be set to :: (double colon) instead.

Lastly, multiple layers of network hardware can have complex interactions, particularly around timeouts. If you're seeing dropped connections, could there be something in your network that kills these if it thinks they're idle? Does it work locally, but not on AWS? Amazon has some quite aggressive connection-killing rules by default and it also ignores the TCP_KEEPALIVE setting that we enable.

njho
Node Link

@david.allen

Currently trying to launch the Community AMI with Neo4j browser v3.2.10 & Neo4j 3.4.9 on Google Chrome.

Are the solutions to ServiceUnavailable: WebSocket connection failure still working?
Is using an SSL certificate the only way to get going using the Community AMI?

Solutions Tried:

  1. Follow directions for your browser to trust the server's certificate for the bolt port, and then refresh the page.
    Does not work. http://ec2-54-xxx-xxx-111.compute-1.amazonaws.com:7687/ on Chrome returns: not a WebSocket handshake request: missing upgrade, and there is not an option to trust this certificate.
  2. Use Google Chrome
    Currently using Google Chrome - Does not work
  3. Set dbms.connector.bolt.tls_level=OPTIONAL in your neo4j config.
    Does this work? Updating this and, dbms.connector.bolt.address=0.0.0.0:7687 did not seem to correct the issue after verifying that the neo4j.config file had changed.

Curious what the best option is to get going! Thanks!

Using an SSL certificate is a good option, but the other option you have is to open traffic on port 7474 and use regular HTTP instead of HTTPS. We enable HTTPS by default and not HTTP because of Amazon security requirements. If you are OK with passing details back and forth without encryption, then HTTP can be used without an SSL cert.

socratic
Node Link

Similar to njho's message above, I am using Neo4j Graph Database - Community Edition from the Amazon Marketplace to set up an AMI.

When I try to use regular HTTP and port 7474 in Google Chrome, I get the "ServiceUnavailable: WebSocket connection failure" error, even though the command nc localhost 7687 successfully holds the port open.

I am using Google Chrome and I set dbms.connector.bolt.tls_level=OPTIONAL, but that doesn't help. And I can't figure out how to tell Chrome to "trust the server's certificate for the bolt port". This AMI uses Neo4j 3.5.1 and not Neo4j 3.5.0.

So the only remaining option is to set up a certificate, but I'm having trouble using the instructions at Getting Certificates for Neo4j with LetsEncrypt. The EC2 Management Console's 'Description' tab has an entry for the "Public DNS (IPv4)" for the instance. I tried to use this entry to satisfy the must have a valid DNS address, but it causes LetsEncrypt to fail. And I was unable to understand how to use Google Domains and Google Cloud DNS.

Any advice would be appreciated. Also, I wonder why Amazon can't make it easy to create AMIs that have a certificate. You have said that "without knowing your local DNS configuration, the cloud image can't do the certificate bits for you." Doesn't the EC2 Management Console know the "local DNS configuration"? Shouldn't I be able to ssh into an instance and run a single command to set up the certificate?

Hi @socratic several things are going on here.

Unfortunately, SSL certs are fundamentally tied to domain names, and you can't get one for a bare IP address. That isn't a neo4j policy, just the rules of how SSL works. Now, the unfortunate interaction here is that the Neo4j browser app requires bolt connectivity, which interacts with the browser's security model, which is what's putting you in the situation where you need the cert. Sorry that this is a pain but there are elements of dynamic cloud setup & the browser security model here that are not really in Neo4j's control.

We can't automatically set up the certificate for you for several reasons. On AWS, not everybody gets a DNS name in the first place. When you do automatically get a DNS name, it's typically mapped to your IP address (the address ends up being something like ec2-X-Y-Z-A where X-Y-Z-A is your IP address). SSL certs are bound to specific host names. So issuing a cert for a hostname like this would not be a good idea, because it would initially work, but if you stopped / restarted your VM, you could end up getting a different hostname, which would effectively break your certificate. Suppose we allowed you to enter your own custom domain name in, for example with a CloudFormation template. Even in this case if the SSL auto-setup would work for you, you would have had to pre-allocate that IP -> DNS name so that it would resolve the test probe that LetsEncrypt needs to do. A bit more on that...

On the problems you're having with the certificate instructions -- I don't think we can help without knowing what exactly the LetsEncrypt failure is. That said, the most common failure by far is either the wrong host name given, or the probe port not being open. Again this is LetsEncrypt stuff not Neo4j, but to issue you a cert they require that you "prove" you have the domain. So you give in "mydomain.com" (or whatever) and their service tries to hit that address on the probe port. If that probe port is firewalled off (which it would be in the standard Neo4j deploy, which doesn't need that port) then your setup could fail. Something to double check.

When you set dbms.connector.bolt.tls_level=OPTIONAL this permits clients to connect unencrypted, but won't help you, if for example, Chrome insists that your connection be encrypted as part of its own security model.

As stated above if you want encrypted communications, Neo4j ships with an untrusted cert out of the box (it can't be trusted because it can't know what DNS name you'll have, and hence can't be signed by a CA). So you can either "trust the untrusted cert", or you can generate a trusted cert.

If you're launching an AMI on AWS, you don't need to do anything with google domains and google cloud DNS. Rather if you're on Amazon, you'd be using their Route 53 service to register a domain, and their Certificate Manager tool. On AWS you could generate the cert on your own (without letsencrypt) and end up with the same files you need to do the SSL setup.

As for why you get "ServiceUnavailable" when you use port 7474 -- I just don't know, we'd need more information. The availability of port 7687 is one thing, but then tls_level is another, and firewalling configuration is a third.

I hope this helps.

david_allen
Neo4j
Neo4j

For those who want to get a valid SSL certificate for Neo4j, please consult this article:

thank you very much for your support. Now I have my server up and running with a valid ssl certificate but somehow I am not able to connect the database. This is the error:

WebSocket connection to 'wss://xxxxxxxxx.xx:7687/' failed: Error in connection establishment: net::ERR_ADDRESS_UNREACHABLE

Have you some hints for me how to solve this issue?

Thank you very much

P.S. After some changes in my docker container I see the following error in the logs

ERROR The RuntimeException could not be mapped to a response, re-throwing to the HTTP container Unable to construct bolt discoverable URI using 'null' as hostname: Expected scheme-specific part at index 5: bolt:

This is the first time for me setting up Neo4J on a cloud platform, so at the moment I've got no clue what is going wrong

Can you give me some more information? What tool gives you this error and when do you see it? The error literally means that the address is unreachable, meaning you've provided some IP address probably where it can't get packets to/from this. So double check the address you enter, and check the firewall rules with your cloud provider to verify that the firewall lets the network traffic through.

Please say more about what your docker container is, otherwise it's hard to tell

Hi David,

I’m running
Neo4j (3.5.3 Enterprise) on a Jelastic PaaS in a docker container along with a
NGINX load balancer.

While connecting to the Neo4J browser, I get
the mentioned error and I’m not able to connect the server

As far as I can
see the server seems to work fine

Do you need
more information?

Thank you
very much

The load balancer may be the issue here. Can you share its configuration? We typically don't use load balancers in front of Neo4j because of the way the routing protocol works. If it's single instance you dont' need a load balancer, and if it's cluster, then load balancers often interfere.

Also, this error is different than the one you reported above. Can't connect and address unroutable are two different problems. When these things occur, it is most likely the case that port 7687 is not getting to Neo4j through some layer of your network config. I'm not sure I can help with this because it depends on how you've configured Jelastic, the load balancer, and the firewall. But this is where I would look.

Finally I've got the solution Maybe it was too simple to be mentioned somewhere, but for me the key: I opened the address "https://domain:7687" and there I added the certificate once again manually and voilà! Now everything is working fine.
I hope this would also help others not to struggle so long with this issue.

shahriar_fakher
Node Link

O.K. David, thank you very much indeed. I'll set up a new environment without the loadbalancer and check how it works.

Thanks for you support

nithin34_it
Node Clone

Hi,
Issue can resolved by doing 2 steps.

  1. Add route 53 in AWS of Neo4j instance with ip address - copy that hostname.
  2. Edit the neo4j.config ... Added one more line item as
    dbms.connector.bolt.advertised_address=<host_name>:7687
  3. Stop the DB and restart again. it will automatically appear bolt connection details, then login using credentials. You will able to access DB.

Thanks,
Nithin.

Dear nithin,

I want to change configuration of Neo4j DB but they are all under neo4j os user. AWS configuration provide ubuntu user to connect to EC instance.
So how can you restart the server as well as change the DB configuration?
Do you need to log in to EC2 instance using neo4j user?
Do you have neo4j password default?

Thanks in advanced

On AWS there is no password for users. You connect by secure SSH keys generated by AWS.

To restart the system service, systemctl restart neo4j and if you have made any configuration changes they will be picked up

Thanks David,

I can restart neo4j service. Howerver, I can not update neo4j.conf file due to lack of permission.
Could you give me further adivce.

I also want to move /data folder to a bigger storage folder but I also have no permission to create new folder and move data files to the new folder.

2X_e_e51163ba0c2a0876e4770f9c652af7a2c7a026b1.png

I find my way to do this thru sudo command.
Thanks for your help.

wilsli
Node

It surely is a matter of WebSocket timeout in my case, this WebSocket connection failure pops out only when I run a heavy query through the Neo4j Browser.

Here's the situation, my graph contains ~145000 nodes and ~152000 relationships, everything goes fine when I run a query through cypher-shell:

MATCH (a:Device {dID:"441076300000001025283863"}), (z:Device {dID:"441076300000001025271029"})
CALL apoc.path.expandConfig(a, {relationshipFilter:"CABLE_SEG|PIGTAIL", limit:500, terminatorNodes:[z]}) yield path as p
WHERE all(n IN relationships(p) where n.kx_count>0)
AND apoc.coll.containsAll([n IN nodes(p) | n.dID], ["441076300000001025283550"])
AND NOT ANY(n IN nodes(p) WHERE n.dID IN ["441076300000001025277680","441076300000001025277884"])
WITH p, apoc.coll.sum([x IN relationships(p) | x.stpCount]) as stpSum
RETURN p, stpSum

I checked the query plan it took ~100000ms to get 14 rows of result from my graph, it worked:

AND it also went fine when I lowered the parameter limit to 200 in the apoc.path.expandConfig() call on the Neo4j Browser:

So I believe it's a matter of timeout issue on the WebSocket, my concern is whether or not this issue will happen when I use the HTTP API to run a heavy query? what is the threshold? at where I can adjust it?

tms
Graph Buddy

I get that all sorts of fancy security measures are needed while attempting to run neo4j on AWS or complex configurations with containers and all that.

Is it asking too much for the docs and distribution to provide a step-by-step recipe for a dirt-simple hello-world smoke test?

I'm trying to get neo4j running on a reasonably vanilla-flavored CentOS 7 guest VM running in a vmware hypervisor on a Windows 10 Pro host. The underlying Windows 10 system is more than adequate.

I want to access neo4j using a standard web browser (Chrome or Firefox) running on the host.

This configuration, other than neo4j, has been working just fine for a long time. The guest has an httpd instance on port 80 that is robust and reliable. I shell into the guest with no issues. I use various host-based IDEs that use various sockets to communicate with the guest and all is fine.

I can't make neo4j work. I've been able to get exactly ONE request to produce a login screen -- that subsequently failed because the WS connection to bolt doesn't work. That's it. I attempted the config file change to make that work, and everything stopped working. I reverted it, and still everything is broken.

I've used systemctl to show that the neo4j service is up and running (it is). I've used firewall-cmd to show that ports 7474 and 7687 are open. They are. I've used netstat to confirm that the open ports have listeners. They do. I've confirmed the selinux is disabled. And yet it does nothing.

When I use curl from a shell connected to the guest, it appears that something is on 7474:

$ curl 127.0.0.1:7474
{
  "bolt_routing" : "neo4j://127.0.0.1:7687",
  "transaction" : "http://127.0.0.1:7474/db/{databaseName}/tx",
  "bolt_direct" : "bolt://127.0.0.1:7687",
  "neo4j_version" : "4.0.0",
  "neo4j_edition" : "community"
}

I haven't found any useful logs -- the "debug.log" in /var/log/neo4j has lots of useless startup and shutdown information and nothing of interest. The journalctl command is similarly unhelpful:

$journalctl -e -u neo4j
~
~
~
-- Logs begin at Wed 2020-02-26 11:56:58 EST, end at Wed 2020-02-26 12:25:09 EST. --
Feb 26 11:57:24 localhost.localdomain systemd[1]: Started Neo4j Graph Database.
Feb 26 11:57:28 localhost.localdomain neo4j[1381]: Directories in use:
Feb 26 11:57:28 localhost.localdomain neo4j[1381]: home:         /var/lib/neo4j
Feb 26 11:57:28 localhost.localdomain neo4j[1381]: config:       /etc/neo4j
Feb 26 11:57:28 localhost.localdomain neo4j[1381]: logs:         /var/log/neo4j
Feb 26 11:57:28 localhost.localdomain neo4j[1381]: plugins:      /var/lib/neo4j/plugins
Feb 26 11:57:28 localhost.localdomain neo4j[1381]: import:       /var/lib/neo4j/import
Feb 26 11:57:28 localhost.localdomain neo4j[1381]: data:         /var/lib/neo4j/data
Feb 26 11:57:28 localhost.localdomain neo4j[1381]: certificates: /var/lib/neo4j/certificates
Feb 26 11:57:28 localhost.localdomain neo4j[1381]: run:          /var/run/neo4j
Feb 26 11:57:28 localhost.localdomain neo4j[1381]: Starting Neo4j.
Feb 26 11:57:38 localhost.localdomain neo4j[1381]: 2020-02-26 16:57:38.701+0000 WARN  Use of deprecated setting dbms.connectors.default_listen_address. It is replaced by dbms.default_listen_
Feb 26 11:57:38 localhost.localdomain neo4j[1381]: 2020-02-26 16:57:38.704+0000 WARN  Use of deprecated setting dbms.directories.certificates. Legacy ssl policy is no longer supported.
Feb 26 11:57:38 localhost.localdomain neo4j[1381]: 2020-02-26 16:57:38.786+0000 INFO  ======== Neo4j 4.0.0 ========
Feb 26 11:57:38 localhost.localdomain neo4j[1381]: 2020-02-26 16:57:38.791+0000 INFO  Starting...
Feb 26 11:58:05 localhost.localdomain neo4j[1381]: 2020-02-26 16:58:05.606+0000 INFO  Bolt enabled on 127.0.0.1:7687.
Feb 26 11:58:05 localhost.localdomain neo4j[1381]: 2020-02-26 16:58:05.607+0000 INFO  Started.
Feb 26 11:58:11 localhost.localdomain neo4j[1381]: 2020-02-26 16:58:11.455+0000 INFO  Remote interface available at http://localhost:7474/

Is it possible for somebody to provide a simple and straightforward recipe for performing a straightforward hello-world roundtrip to demonstrate that neo4j is installed and running on a local guest VM? If it can be done with Apache, the various IDEs, nodejs, react, angular, mongodb, and mysql, is it so very hard to do with neo4j?

@tms you're using 4.0 -- and I think you're running into an issue discussed / solved here: Cypher-shell certificates

Is it easier to use an earlier version of neo4j?

I'm trying very hard to avoid getting sucked down the rathole of certs, domain names, and all that. I get that I need all that for a production server facing the jungle.

I'm running a guest VM on my own physical computer with several firewalls between it and the outside world. None of these communications require any complex authentication technology.

I appreciate the quick response, I'm not trying to bust your chops. I'm just looking for the quickest dirtiest path to doing some local experimentation with neo4j. I may even just punt this altogether and try connecting from running python code using the python drivers.

@tms the absolute quickest and dirtiest way to try Neo4j is neo4jsandbox.com, nothing to install or use.

After that, if you want to set it up yourself, I'd ask you to consider using one of the Cloud Marketplace launches. After that, if you don't want a cluster, you should consider something like this: https://medium.com/neo4j/how-to-automate-neo4j-deploys-on-google-cloud-platform-gcp-6e123eccfd5e (there's an equivalent article for AWS and Azure if you're there). Most of those options (as of this writing) will be 3.5 series.

Setting up and configuring everything yourself, on a VM of your own creation, using 4.0.0 - your best option is to follow the guidance I linked above.

Ok, got it.

I note the following from an elevated command prompt on the Windows 10 pro host:

C:\WINDOWS\system32>curl http://192.168.242.128:80
<h1>Hello world</h1>

C:\WINDOWS\system32>curl http://192.168.242.128:7474
curl: (7) Failed to connect to 192.168.242.128 port 7474: Connection refused

The first shows that the host is able to communicate with the guest on port 80. The second shows (I think) that the guest won't respond on port 7474.

I guess this is consistent with the certificate issue you describe. I guess I'll make and install a self-signed certificate -- I've done that before, but in a different VM and environment.

tms
Graph Buddy

This turned out to be a configuration issue in neo4j.conf.

For some reason, the following line was commented out:

dbms.connector.http.enabled=true

Not surprisingly, the server ignored 7474 while configured that way. I don't remember doing that, I wonder if perhaps the default distribution comes that way?

Anyway, once I turned it on the neo browser started working. I'm still at v4.0.0.0, by the way. I've left authentication turned off:

dbms.security.auth_enabled=false

Both Chrome and Firefox seem to be doing just fine, neither is complaining.

davisford
Node Link

@david.allen I have a single node neo4j:4.0.1-enterprise in a Kubernetes cluster. It is behind a load balancer and we are using Ingress to expose the browser and the bolt connection via the following configuration:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: alma-ingress
spec:
  rules:
    - host: neo4j.foo.example.com
      http:
        paths:
          - path: /
            backend:
              serviceName: neo4j
              servicePort: 7474
    - host: bolt.foo.example.com
      http:
        paths:
          - path: /
            backend:
              serviceName: neo4j
              servicePort: 7687

This type of configuration had worked fine for us on neo4j:3.5-enterprise, so we could connect to the browser just fine. We are using Amazon certs so SSL/TLS is legit and not a problem.

When we upgraded to 4.0 this broke. Our load balancer exposes only two ports: 80, 443. Our Ingress redirects all 80 to 443 and our cert is valid, and the load balancer terminates the TLS for us.

We have encryption turned off on the Neo4j server and we have HTTPS also turned off.

When I connect to the browser, I'll use the address like https://neo4j.foo.example.com and the browser loads. For the bolt address, then I will use bolt.foo.example.com:443 with user/pass.

What happens next is we do connect and get the 101 UPGRADE, and there are some websocket frames exchanged. The client issues the command dbms.routing.getRoutingTable it seems:

2X_4_4200c2b6f2ecd64e340535c8adbca713d92cfd46.png

But the server responds back with address 0.0.0.0:7687 and this is not routable, so the browser tries to connect to that and fails, and this repeats ad infinitum.

2X_2_2c9488c667a40826a9df35069bc0360149c87a39.png

I have tried to disable this with the settings:

dbms.mode=single
causal_clustering.cluster_allow_reads_on_followers=false

as per https://neo4j.com/docs/operations-manual/current/reference/configuration-settings/#config_causal_clu...

I don't want the server to run causal clustering, but we want some of the other enterprise features. We want to run in single mode, and I'm unsure how to get the WebSocket connection back working again.

Can you please advise?

Thanks in advance,
Davis

@davisford there were a number of config changes in 4.0, and the site has a migration guide from 3.5 -> 4.0. Could you share your config, and show some logs from the pod, preferrably a debug.log dump?

If the server is responding back with 0.0.0.0 that is indeed not routable -- and this suggests that the Neo4j pod is incorrectly configured with respect to its default_advertised_address. I don't remember this detail off hand but be sure to very carefully check your connector settings, as some configuration key names changed in 4.0. So if you copied the config you were using from 3.5, almost certainly that's your problem.

@david.allen attached is debug.log, also neo4j.conf and I also copy the file overrides.conf into the /conf dir, b/c the docs state that the server should pick up any other conf files and apply them as overrides. Not sure if it is working?

Let me know if you see anything. One thing of note, I see the docker boot shell script does some manipulation to the conf at startup. Something else is also adding these properties like:

SERVICE.PORT.BROWSER=7474
SERVICE.PORT.BOLT=7687
SERVICE.PORT=7474
SERVICE.HOST=10.100.1.238
PORT.7687.TCP.PROTO=tcp
PORT.7687.TCP.PORT=7687
PORT.7687.TCP.ADDR=10.100.1.238
PORT.7687.TCP=tcp://10.100.1.238:7687
PORT.7474.TCP.PROTO=tcp
PORT.7474.TCP.PORT=7474
PORT.7474.TCP.ADDR=10.100.1.238
PORT.7474.TCP=tcp://10.100.1.238:7474
PORT=tcp://10.100.1.238:7474

...at runtime. The server logs complain it doesn't understand these, but I'm not sure how/why they are getting added. They are not in the default config I am using...debug.log.txt (110.5 KB) neo4j.conf.txt (36.6 KB) overrides.conf.txt (1.5 KB)

@davisford grep your neo4j.conf for advertised_address, and I see your problem, it has a number of entries like this:

dbms.connector.bolt.advertised_address=0.0.0.0:7687

There's your 0.0.0.0 advertisement right there (this also holds in your file for http/https). That isn't routable outside of kubernetes, so you should change that to whatever the externally valid/addressable address should be.

Will that affect my internal k8s pods that use a service though? I define a k8s service with labels/selectors and that is how my pods find neo4j. If I fix that property to a DNS entry like bolt.foo.example.com:7687, will the server reject requests from internal k8s IPs?

No. The advertised address is about how the server advertises to the world, it isn't about what connections it will accept. If you advertise an externally routable address, it will still accept connections from anywhere, subject to the network interface you bind to internally and your local firewall rules. For Neo4j in Kubernetes -- I really recommend having a look at this: https://medium.com/neo4j/neo4j-considerations-in-orchestration-environments-584db747dca5

Thanks for that. We are using a storage orchestrator (STORK), thus we don't need a cluster. It replicates the data volumes for us and ensures hyper-convergence.

Hi @david.allen I'm still having a problem with this. I have tried to override these values with environment variables in the deployment / pod spec, but it seems like the docker sh script that is embedded in the container overrides my environment values.

Here's a look at the deployed pod spec with a few things redacted -- note the fqdn-here represents a real DNS fully qualified domain name that I've redacted here.

It is receiving some of my environment variables (e.g. I enable prometheus monitoring and those stick), but it just always seems to overwrite the advertised address to be 0.0.0.0 no matter what I do.

apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/psp: eks.privileged
    prometheus.io/port: "2004"
    prometheus.io/scrape: "true"
  creationTimestamp: "2020-03-31T22:08:59Z"
  generateName: neo4j-6d6585bcbf-
  labels:
    app: neo4j
    pod-template-hash: 6d6585bcbf
  name: neo4j-6d6585bcbf-fl8pw
  namespace: alma
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: neo4j-6d6585bcbf
    uid: 38aefaa9-739c-11ea-8fd4-0aa6c32e78f9
  resourceVersion: "43523590"
  selfLink: /api/v1/namespaces/alma/pods/neo4j-6d6585bcbf-fl8pw
  uid: 38afec27-739c-11ea-8fd4-0aa6c32e78f9
spec:
  containers:
  - env:
    - name: NEO4J_ACCEPT_LICENSE_AGREEMENT
      value: "yes"
    - name: NEO4J_AUTH
      value: neo4j/Salido4u-2.78
    - name: NEO4J_dbms_mode
      value: single
    - name: NEO4J_metrics_prometheus_enabled
      value: "true"
    - name: NEO4J_metrics_prometheus_endpoint
      value: 0.0.0.0:2004
    - name: NEO4J_dbms_connectors_default_listen_address
      value: 0.0.0.0
    - name: NEO4J_dbms_logs_query_threshold
      value: 2s
    - name: NEO4J_dbms_logs_query_rotation_size
      value: 20m
    - name: NEO4J_dbms_logs_query_rotation_keep_number
      value: "7"
    - name: NEO4J_dbms_logs_query_time_logging_enabled
      value: "true"
    - name: NEO4J_dbms_logs_query_page_logging_enabled
      value: "true"
    - name: NEO4J_dbms_connector_bolt_address
      value: :7687
    - name: NEO4J_dbms_connector_https_advertised_address
      value: fqdn-here:7473
    - name: NEO4J_dbms_connector_http_advertised_address
      value: fqdn-here:7474
    - name: NEO4J_dbms_connector_bolt_advertised_address
      value: fqdn-here:7687
    image: neo4j:4.0.2-enterprise
    imagePullPolicy: IfNotPresent
    name: neo4j
    ports:
    - containerPort: 7474
      name: browser
      protocol: TCP
    - containerPort: 7687
      name: bolt
      protocol: TCP
    - containerPort: 2004
      name: metrics
      protocol: TCP
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/lib/neo4j/data/
      name: neo4jdata
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-2t5rf
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: ip-192-168-174-9.ec2.internal
  nodeSelector:
    beta.kubernetes.io/instance-type: m4.large
  priority: 0
  restartPolicy: Always
  schedulerName: stork
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: neo4jdata
    persistentVolumeClaim:
      claimName: px-neo4j-pvc
  - name: default-token-2t5rf
    secret:
      defaultMode: 420
      secretName: default-token-2t5rf
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-03-31T22:08:59Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2020-03-31T22:09:00Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2020-03-31T22:09:00Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2020-03-31T22:08:59Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://1223f6779066aa0dab7c8c1f482d9f04584ceb623700594ca0095ef8e4a197fa
    image: neo4j:4.0.2-enterprise
    imageID: docker-pullable://neo4j@sha256:a090c2ed169a68bdbf7dd2f1e5b0c47891530d489dc7f5a5f43c8d719b5a32e4
    lastState: {}
    name: neo4j
    ready: true
    restartCount: 0
    state:
      running:
        startedAt: "2020-03-31T22:09:00Z"
  hostIP: 192.168.174.9
  phase: Running
  podIP: 192.168.174.77
  qosClass: BestEffort
  startTime: "2020-03-31T22:08:59Z"

When I shell into the pod itself and cat logs/debug.log I can see it resets these addresses back to 0.0.0.0, and indeed when I try the WebSocket it again responds with 0.0.0.0 address.

Here's a snipped from that log note the bolt advertised address is now reset again to 0.0.0.0 -- what am I missing here?

2020-03-31 22:09:08.198+0000 INFO [o.n.i.d.DiagnosticsManager] --------------------------------------------------------------------------------
2020-03-31 22:09:08.198+0000 INFO [o.n.i.d.DiagnosticsManager]                                 [ DBMS config ]
2020-03-31 22:09:08.198+0000 INFO [o.n.i.d.DiagnosticsManager] --------------------------------------------------------------------------------
2020-03-31 22:09:08.200+0000 INFO [o.n.i.d.DiagnosticsManager] DBMS provided settings:
2020-03-31 22:09:08.209+0000 INFO [o.n.i.d.DiagnosticsManager] causal_clustering.discovery_advertised_address=neo4j-6d6585bcbf-fl8pw:5000
2020-03-31 22:09:08.209+0000 INFO [o.n.i.d.DiagnosticsManager] causal_clustering.discovery_listen_address=0.0.0.0:5000
2020-03-31 22:09:08.209+0000 INFO [o.n.i.d.DiagnosticsManager] causal_clustering.raft_advertised_address=neo4j-6d6585bcbf-fl8pw:7000
2020-03-31 22:09:08.210+0000 INFO [o.n.i.d.DiagnosticsManager] causal_clustering.raft_listen_address=0.0.0.0:7000
2020-03-31 22:09:08.210+0000 INFO [o.n.i.d.DiagnosticsManager] causal_clustering.transaction_advertised_address=neo4j-6d6585bcbf-fl8pw:6000
2020-03-31 22:09:08.210+0000 INFO [o.n.i.d.DiagnosticsManager] causal_clustering.transaction_listen_address=0.0.0.0:6000
2020-03-31 22:09:08.210+0000 INFO [o.n.i.d.DiagnosticsManager] dbms.connector.bolt.advertised_address=0.0.0.0:7687
2020-03-31 22:09:08.211+0000 INFO [o.n.i.d.DiagnosticsManager] dbms.connector.bolt.enabled=true
2020-03-31 22:09:08.211+0000 INFO [o.n.i.d.DiagnosticsManager] dbms.connector.http.advertised_address=0.0.0.0:7474
2020-03-31 22:09:08.211+0000 INFO [o.n.i.d.DiagnosticsManager] dbms.connector.http.enabled=true
2020-03-31 22:09:08.211+0000 INFO [o.n.i.d.DiagnosticsManager] dbms.connector.https.advertised_address=0.0.0.0:7473
2020-03-31 22:09:08.211+0000 INFO [o.n.i.d.DiagnosticsManager] dbms.connector.https.enabled=false
2020-03-31 22:09:08.211+0000 INFO [o.n.i.d.DiagnosticsManager] dbms.default_listen_address=0.0.0.0
2020-03-31 22:09:08.212+0000 INFO [o.n.i.d.DiagnosticsManager] dbms.directories.import=/var/lib/neo4j/import
2020-03-31 22:09:08.212+0000 INFO [o.n.i.d.DiagnosticsManager] dbms.directories.logs=/logs
2020-03-31 22:09:08.212+0000 INFO [o.n.i.d.DiagnosticsManager] dbms.directories.neo4j_home=/var/lib/neo4j
2020-03-31 22:09:08.212+0000 INFO [o.n.i.d.DiagnosticsManager] dbms.jvm.additional=-Djdk.tls.rejectClientInitiatedRenegotiation=true
2020-03-31 22:09:08.212+0000 INFO [o.n.i.d.DiagnosticsManager] dbms.logs.query.rotation.size=20971520
2020-03-31 22:09:08.213+0000 INFO [o.n.i.d.DiagnosticsManager] dbms.logs.query.threshold=2s
2020-03-31 22:09:08.213+0000 INFO [o.n.i.d.DiagnosticsManager] dbms.memory.pagecache.size=512M
2020-03-31 22:09:08.213+0000 INFO [o.n.i.d.DiagnosticsManager] dbms.mode=SINGLE
2020-03-31 22:09:08.213+0000 INFO [o.n.i.d.DiagnosticsManager] dbms.tx_log.rotation.retention_policy=100M size
2020-03-31 22:09:08.213+0000 INFO [o.n.i.d.DiagnosticsManager] dbms.windows_service_name=neo4j
2020-03-31 22:09:08.213+0000 INFO [o.n.i.d.DiagnosticsManager] metrics.prometheus.enabled=true
2020-03-31 22:09:08.214+0000 INFO [o.n.i.d.DiagnosticsManager] metrics.prometheus.endpoint=0.0.0.0:2004
2020-03-31 22:09:08.214+0000 INFO [o.n.i.d.DiagnosticsManager]

Nevermind, I got it. I see I specified the environment variable incorrectly (need two underscores). I use a kustomize patch and this works now (I have Ingress that routes bolt / 443 to 7687 on the service:

- op: add
  path: /spec/template/spec/containers/0/env/-
  value:
    name: NEO4J_dbms_connector_bolt_advertised__address
    value: "bolt.somewhere.com:443"

Thanks David - this has really helped me track down a similar problem I was having!

heldersepu
Node

the documentation says:
https://aws.amazon.com/marketplace/pp/Neo4j-Neo4j-Graph-Database-Community-Edition/B071P26C9D#pdp-us...
The username will be neo4j, and the password will be the instance ID.

that is not true, passw was neo4j and logging in via cypher-shell prompts to change it

lavanya_kannan
Graph Buddy

@david.allen I have these exact error messages via remote access to our team's neo4J 4.1 community version installation on a microsoft Azure server when I run some queries through cypher-shell. Actually, this error appears systematically only after 2-3 hours after the query is being successfully executed.

Command:

  1. cat $CQL_FILES/xyz.cql | $CYPHERSHELL -u neo4j -p admin123 -a bolt://localhost:7687 > $CQL_LOGS/xyz.log results in the following error systematically after 2-3 hours
    Connection to the database terminated. Please ensure that your database is listening on the correct host and port and that you have compatible encryption settings both on Neo4j server and driver. Note that the default encryption setting has changed in Neo4j 4.0

  2. cat $CQL_FILES/xyz.cql | $CYPHERSHELL -u neo4j -p admin123 > $CQL_LOGS/xyz.log with default address results in the following error immedietely
    Failed to obtain connection towards WRITE server. Known routing table is: Ttl 1595167651163, currentTime 1595167381208, routers AddressSet=[], writers AddressSet=[], readers AddressSet=[], database '<default database>'

Here are the connector configurations in neo4j.conf file. Kindly let us know how to troubleshoot the above issues.

#dbms.default_listen_address=0.0.0.0

dbms.connectors.default_listen_address=0.0.0.0

# Bolt connector

dbms.connector.bolt.enabled=true

#dbms.connector.bolt.tls_level=DISABLED

dbms.connector.bolt.listen_address=0.0.0.0:7687

dbms.connector.bolt.address=0.0.0.0:7687

dbms.connector.bolt.advertised_address=A.B.C.D

# HTTP Connector. There can be zero or one HTTP connectors.

dbms.connector.http.enabled=true

dbms.connector.http.listen_address=:7474

# HTTPS Connector. There can be zero or one HTTPS connectors.

dbms.connector.https.enabled=false

#dbms.connector.https.listen_address=:7473

PS: I have just enabled the connector listen address and will post my updates when I have them.

Thanks,
Lavanya

This error usually means that the routing table is not routable from the client's perspective. This happens when the advertised address is set to something that the client machine cannot access, for example. Try connecting using only bolt (not neo4j://) and do call dbms.routing.getRoutingTable({}, 'system'); and see what the results say. If you see any addresses that can't be routed from the client's perspective that's the most likely problem.

I got the following:

[
{
"addresses": [
"A.B.C.D:7687"
],
"role": "WRITE"
},
{
"addresses": [
"A.B.C.D:7687"
],
"role": "READ"
},
{
"addresses": [
"A.B.C.D:7687"
],
"role": "ROUTE"
}
]

The above seems alright. I still see the above two errors. Do you think the real issue is with the encryption settings in the driver? Please see Which version of Neo 4J driver to install for Neo4j versions 4.0 and 4.1

Lavanya

I have a problem in dotnet connection with neo4j

Error:-
An unhandled exception of type 'Neo4j.Driver.V1.ServiceUnavailableException' occurred in Neo4j.Driver.dll

Additional information: Failed after retried for 6 times in 30000ms. Make sure that your database is online and retry again.

I'm using,
visual studio 2015,
Neo4j driver 1.7,
Neo4j desktop application with 4.1.0

I'm having trouble connecting to neo4j browser behind haproxy. I'm running 4.1 Enterprise in a casual cluster configuration. I have no issues connecting to individual cluster members. However the browser fails to load when I access cluster via haproxy. This same configuration worked in 3.5 Enterprise.

I can verify that it connects to a server and starts to load js files via the proxy log and network traffic in browser. It appears to timeout while loading ui.chunkhash.bundle.js, cypher-codemirror.chunkhash.bundle.js or app-340ee6332805876eb588.js

I increased ha proxy timeouts to 2 minutes. I tried changing the default advertising address to the load balancer and to the cluster member. Neither worked Does Neo4j 4.1 Enterprise work behind haproxy or any load balancers?

@david.allen
We're facing the issue - We're trying to visualize neo4j data on frontend, react app using neovis library, We get, "Uncaught Error: Encryption/trust can only be configured either through URL or config, not both". We currently use neo4j version 3.5 deployed on AWS.

@pratikmakune3 This error:

Uncaught Error: Encryption/trust can only be configured either through URL or config, not both

When you create a driver instance, you can pass it configuration parameters. One of them is "trust" which specifies whether or not to trust self-signed certificates, for example. Here's an example of driver configuration options I'm talking about: https://neo4j.com/docs/api/javascript-driver/current/function/index.html#static-function-driver

When you specify a Neo4j URL, you can specify the same information. For example neo4j+s:// means that you insist on secure certs, while neo4j+ssc:// means that self-signed certificates are also OK.

If you did this in javascript:

const driver = neo4j.driver("neo4j+ssc://myhost", authDetails, { trust: 'TRUST_SYSTEM_CA_SIGNED_CERTIFICATES' })

Then you would both be telling the driver to only trust system CA signed certs but ALSO be telling it to trust self-signed certs as well. This is a conflict, and so you would get this error.

The solution is to specify the trust strategy in EITHER the URL or the driver settings, but never both, which explains the message. For example, if in that code example you used neo4j:// instead of neo4j+ssc:// it would probably work.

john_allen
Node Link

I solved our issue here.. works sweet !! except now mongodb relationships as objids are not showing as lines in neo.. BUT we sovled by ensuring our neo4j db was created ! using v 3.5.17 .. no now connection issue from mono-connector

shawnngtq
Node Link

@david.allen

Neo4j version: 4.2.6

I have problem using cypher shell after making changes to neo4j.conf:

dbms.default_listen_address=0.0.0.0

dbms.default_advertised_address=abc.com

dbms.connector.bolt.tls_level=REQUIRED

dbms.connector.http.enabled=false

dbms.connector.https.enabled=true

dbms.ssl.policy.bolt.enabled=true
dbms.ssl.policy.bolt.base_directory=certificates/bolt
dbms.ssl.policy.bolt.private_key=private.key
dbms.ssl.policy.bolt.public_certificate=public.crt
dbms.ssl.policy.bolt.client_auth=NONE

dbms.ssl.policy.https.enabled=true
dbms.ssl.policy.https.base_directory=certificates/https
dbms.ssl.policy.https.private_key=private.key
dbms.ssl.policy.https.public_certificate=public.crt
dbms.ssl.policy.https.client_auth=NONE

Using browser, abc.com:7473 works fine and abc.com:7687 returns not a WebSocket handshake request: missing upgrade.

# expected error when I call without stating address
$ ./cypher-shell
Connection to the database terminated. Please ensure that your database is listening on the correct host and port and that you have compatible encryption settings both on Neo4j server and driver. Note that the default encryption setting has changed in Neo4j 4.0.
# I tried the following, but they gave the same error
$ ./cypher-shell -a abc.com:7687
$ ./cypher-shell -a neo4j://abc.com:7687
$ ./cypher-shell -a neo4j+s://abc.com:7687
$ ./cypher-shell -a bolt://abc.com:7687
$ ./cypher-shell -a bolt+s://abc.com:7687

This problem goes away if I set dbms.connector.bolt.tls_level=OPTIONAL. Can I understand how I can fix this?

Also I would like to hear your opinion on dbms.default_listen_address=0.0.0.0, to make it more secure, I should change 0.0.0.0 to specific ip right?

nikhil1
Node

Hi

I am also getting intermittent connectivity issues from Neo4j Desktop to the server deployed in an Azure VM.

95% of the time it work's fine, but some times when my co workers try and connect ,they get that websocket connection issue and says that the error message cannot be displayed on your browser(Neo4j Desktop).

Any idea why there would be intermittent issues ? How can i check what is going wrong ? because it occurs rarely and gets resolved by itself within half an hour or so .
This is the exact error message ,

Tue, 15 Jun 2021 12:59:50 GMT

WebSocket connection failure. Due to security constraints in your web browser, the reason for the failure is not available to this Neo4j Driver. Please use your browsers development console to determine the root cause of the failure. Common reasons i..."
Neo4j Version: 4.2.2
Neo4j Desktop Version :1.4.5

Downloaded AMI today AWS Marketplace: Neo4j - Community Edition
After about 6 hours was able to connect from desktop.
set dbms.default_listen_address=0.0.0.0
changing advertised addresses to external ip's:
dbms.connector.bolt.advertised_address
dbms.default_advertised_address
hard coded in template.conf (chmod go+rw)
Connecting on port 7474 seems to work not 7473
Make sure security groups permit traffic on all 3 ports

Security group rule ID Port range Protocol Source Security groups
sgr-0b28a793018f4e67a 7473 TCP 0.0.0.0/0 Neo4j - Community Edition-4-4-3-AutogenByAWSMP-
sgr-0e0f5b3241925da6d 7474 TCP 0.0.0.0/0 Neo4j - Community Edition-4-4-3-AutogenByAWSMP-
sgr-0c9c074eb3a5a8fe3 7687 TCP 0.0.0.0/0 Neo4j - Community Edition-4-4-3-AutogenByAWSMP-
sgr-010232536bc92fafd 22 TCP 0.0.0.0/0 Neo4j - Community Edition-4-4-3-AutogenByAWSMP-

gpgupta7891
Node Link

hey @david_allen ,

I have build an ubuntu server and the URL https://neo4j_ip:7473 was working fine. We did couple of reboots of this server and stop/start the neo4j service to test the backups but the url is not working now.

I have run the command: sudo netstat -ltnp

here are the results:

ctive Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN 745/systemd-resolve
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 917/sshd
tcp6 0 0 :::22 :::* LISTEN 917/sshd

Another command i ran: url localhost:7473

results:

curl: (7) Failed to connect to localhost port 7473: Connection refused

I am running ubuntu server 18.04.

I think the ubuntu server have stopped listening on the neo4j ports. Can you please let me know what should i do to fix it.