Troubleshooting Connection Issues to Neo4j

This turned out to be a configuration issue in neo4j.conf.

For some reason, the following line was commented out:

dbms.connector.http.enabled=true

Not surprisingly, the server ignored 7474 while configured that way. I don't remember doing that, I wonder if perhaps the default distribution comes that way?

Anyway, once I turned it on the neo browser started working. I'm still at v4.0.0.0, by the way. I've left authentication turned off:

dbms.security.auth_enabled=false

Both Chrome and Firefox seem to be doing just fine, neither is complaining.

@david_allen I have a single node neo4j:4.0.1-enterprise in a Kubernetes cluster. It is behind a load balancer and we are using Ingress to expose the browser and the bolt connection via the following configuration:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: alma-ingress
spec:
  rules:
    - host: neo4j.foo.example.com
      http:
        paths:
          - path: /
            backend:
              serviceName: neo4j
              servicePort: 7474
    - host: bolt.foo.example.com
      http:
        paths:
          - path: /
            backend:
              serviceName: neo4j
              servicePort: 7687

This type of configuration had worked fine for us on neo4j:3.5-enterprise, so we could connect to the browser just fine. We are using Amazon certs so SSL/TLS is legit and not a problem.

When we upgraded to 4.0 this broke. Our load balancer exposes only two ports: 80, 443. Our Ingress redirects all 80 to 443 and our cert is valid, and the load balancer terminates the TLS for us.

We have encryption turned off on the Neo4j server and we have HTTPS also turned off.

When I connect to the browser, I'll use the address like https://neo4j.foo.example.com and the browser loads. For the bolt address, then I will use bolt.foo.example.com:443 with user/pass.

What happens next is we do connect and get the 101 UPGRADE, and there are some websocket frames exchanged. The client issues the command dbms.routing.getRoutingTable it seems:

Screen Shot 2020-03-04 at 10.23.39 AM

But the server responds back with address 0.0.0.0:7687 and this is not routable, so the browser tries to connect to that and fails, and this repeats ad infinitum.

Screen Shot 2020-03-04 at 10.24.55 AM

I have tried to disable this with the settings:

dbms.mode=single
causal_clustering.cluster_allow_reads_on_followers=false

as per Configuration settings - Operations Manual

I don't want the server to run causal clustering, but we want some of the other enterprise features. We want to run in single mode, and I'm unsure how to get the WebSocket connection back working again.

Can you please advise?

Thanks in advance,
Davis

@davisford there were a number of config changes in 4.0, and the site has a migration guide from 3.5 -> 4.0. Could you share your config, and show some logs from the pod, preferrably a debug.log dump?

If the server is responding back with 0.0.0.0 that is indeed not routable -- and this suggests that the Neo4j pod is incorrectly configured with respect to its default_advertised_address. I don't remember this detail off hand but be sure to very carefully check your connector settings, as some configuration key names changed in 4.0. So if you copied the config you were using from 3.5, almost certainly that's your problem.

@david_allen attached is debug.log, also neo4j.conf and I also copy the file overrides.conf into the /conf dir, b/c the docs state that the server should pick up any other conf files and apply them as overrides. Not sure if it is working?

Let me know if you see anything. One thing of note, I see the docker boot shell script does some manipulation to the conf at startup. Something else is also adding these properties like:

SERVICE.PORT.BROWSER=7474
SERVICE.PORT.BOLT=7687
SERVICE.PORT=7474
SERVICE.HOST=10.100.1.238
PORT.7687.TCP.PROTO=tcp
PORT.7687.TCP.PORT=7687
PORT.7687.TCP.ADDR=10.100.1.238
PORT.7687.TCP=tcp://10.100.1.238:7687
PORT.7474.TCP.PROTO=tcp
PORT.7474.TCP.PORT=7474
PORT.7474.TCP.ADDR=10.100.1.238
PORT.7474.TCP=tcp://10.100.1.238:7474
PORT=tcp://10.100.1.238:7474

...at runtime. The server logs complain it doesn't understand these, but I'm not sure how/why they are getting added. They are not in the default config I am using...debug.log.txt (110.5 KB) neo4j.conf.txt (36.6 KB) overrides.conf.txt (1.5 KB)

@davisford grep your neo4j.conf for advertised_address, and I see your problem, it has a number of entries like this:

dbms.connector.bolt.advertised_address=0.0.0.0:7687

There's your 0.0.0.0 advertisement right there (this also holds in your file for http/https). That isn't routable outside of kubernetes, so you should change that to whatever the externally valid/addressable address should be.

1 Like

Will that affect my internal k8s pods that use a service though? I define a k8s service with labels/selectors and that is how my pods find neo4j. If I fix that property to a DNS entry like bolt.foo.example.com:7687, will the server reject requests from internal k8s IPs?

No. The advertised address is about how the server advertises to the world, it isn't about what connections it will accept. If you advertise an externally routable address, it will still accept connections from anywhere, subject to the network interface you bind to internally and your local firewall rules. For Neo4j in Kubernetes -- I really recommend having a look at this: Neo4j Considerations in Orchestration Environments | by David Allen | Neo4j Developer Blog | Medium

Thanks for that. We are using a storage orchestrator (STORK), thus we don't need a cluster. It replicates the data volumes for us and ensures hyper-convergence.

Hi @david_allen I'm still having a problem with this. I have tried to override these values with environment variables in the deployment / pod spec, but it seems like the docker sh script that is embedded in the container overrides my environment values.

Here's a look at the deployed pod spec with a few things redacted -- note the fqdn-here represents a real DNS fully qualified domain name that I've redacted here.

It is receiving some of my environment variables (e.g. I enable prometheus monitoring and those stick), but it just always seems to overwrite the advertised address to be 0.0.0.0 no matter what I do.

apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/psp: eks.privileged
    prometheus.io/port: "2004"
    prometheus.io/scrape: "true"
  creationTimestamp: "2020-03-31T22:08:59Z"
  generateName: neo4j-6d6585bcbf-
  labels:
    app: neo4j
    pod-template-hash: 6d6585bcbf
  name: neo4j-6d6585bcbf-fl8pw
  namespace: alma
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: neo4j-6d6585bcbf
    uid: 38aefaa9-739c-11ea-8fd4-0aa6c32e78f9
  resourceVersion: "43523590"
  selfLink: /api/v1/namespaces/alma/pods/neo4j-6d6585bcbf-fl8pw
  uid: 38afec27-739c-11ea-8fd4-0aa6c32e78f9
spec:
  containers:
  - env:
    - name: NEO4J_ACCEPT_LICENSE_AGREEMENT
      value: "yes"
    - name: NEO4J_AUTH
      value: neo4j/Salido4u-2.78
    - name: NEO4J_dbms_mode
      value: single
    - name: NEO4J_metrics_prometheus_enabled
      value: "true"
    - name: NEO4J_metrics_prometheus_endpoint
      value: 0.0.0.0:2004
    - name: NEO4J_dbms_connectors_default_listen_address
      value: 0.0.0.0
    - name: NEO4J_dbms_logs_query_threshold
      value: 2s
    - name: NEO4J_dbms_logs_query_rotation_size
      value: 20m
    - name: NEO4J_dbms_logs_query_rotation_keep_number
      value: "7"
    - name: NEO4J_dbms_logs_query_time_logging_enabled
      value: "true"
    - name: NEO4J_dbms_logs_query_page_logging_enabled
      value: "true"
    - name: NEO4J_dbms_connector_bolt_address
      value: :7687
    - name: NEO4J_dbms_connector_https_advertised_address
      value: fqdn-here:7473
    - name: NEO4J_dbms_connector_http_advertised_address
      value: fqdn-here:7474
    - name: NEO4J_dbms_connector_bolt_advertised_address
      value: fqdn-here:7687
    image: neo4j:4.0.2-enterprise
    imagePullPolicy: IfNotPresent
    name: neo4j
    ports:
    - containerPort: 7474
      name: browser
      protocol: TCP
    - containerPort: 7687
      name: bolt
      protocol: TCP
    - containerPort: 2004
      name: metrics
      protocol: TCP
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/lib/neo4j/data/
      name: neo4jdata
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-2t5rf
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: ip-192-168-174-9.ec2.internal
  nodeSelector:
    beta.kubernetes.io/instance-type: m4.large
  priority: 0
  restartPolicy: Always
  schedulerName: stork
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: neo4jdata
    persistentVolumeClaim:
      claimName: px-neo4j-pvc
  - name: default-token-2t5rf
    secret:
      defaultMode: 420
      secretName: default-token-2t5rf
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-03-31T22:08:59Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2020-03-31T22:09:00Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2020-03-31T22:09:00Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2020-03-31T22:08:59Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://1223f6779066aa0dab7c8c1f482d9f04584ceb623700594ca0095ef8e4a197fa
    image: neo4j:4.0.2-enterprise
    imageID: docker-pullable://neo4j@sha256:a090c2ed169a68bdbf7dd2f1e5b0c47891530d489dc7f5a5f43c8d719b5a32e4
    lastState: {}
    name: neo4j
    ready: true
    restartCount: 0
    state:
      running:
        startedAt: "2020-03-31T22:09:00Z"
  hostIP: 192.168.174.9
  phase: Running
  podIP: 192.168.174.77
  qosClass: BestEffort
  startTime: "2020-03-31T22:08:59Z"

When I shell into the pod itself and cat logs/debug.log I can see it resets these addresses back to 0.0.0.0, and indeed when I try the WebSocket it again responds with 0.0.0.0 address.

Here's a snipped from that log note the bolt advertised address is now reset again to 0.0.0.0 -- what am I missing here?

2020-03-31 22:09:08.198+0000 INFO [o.n.i.d.DiagnosticsManager] --------------------------------------------------------------------------------
2020-03-31 22:09:08.198+0000 INFO [o.n.i.d.DiagnosticsManager]                                 [ DBMS config ]
2020-03-31 22:09:08.198+0000 INFO [o.n.i.d.DiagnosticsManager] --------------------------------------------------------------------------------
2020-03-31 22:09:08.200+0000 INFO [o.n.i.d.DiagnosticsManager] DBMS provided settings:
2020-03-31 22:09:08.209+0000 INFO [o.n.i.d.DiagnosticsManager] causal_clustering.discovery_advertised_address=neo4j-6d6585bcbf-fl8pw:5000
2020-03-31 22:09:08.209+0000 INFO [o.n.i.d.DiagnosticsManager] causal_clustering.discovery_listen_address=0.0.0.0:5000
2020-03-31 22:09:08.209+0000 INFO [o.n.i.d.DiagnosticsManager] causal_clustering.raft_advertised_address=neo4j-6d6585bcbf-fl8pw:7000
2020-03-31 22:09:08.210+0000 INFO [o.n.i.d.DiagnosticsManager] causal_clustering.raft_listen_address=0.0.0.0:7000
2020-03-31 22:09:08.210+0000 INFO [o.n.i.d.DiagnosticsManager] causal_clustering.transaction_advertised_address=neo4j-6d6585bcbf-fl8pw:6000
2020-03-31 22:09:08.210+0000 INFO [o.n.i.d.DiagnosticsManager] causal_clustering.transaction_listen_address=0.0.0.0:6000
2020-03-31 22:09:08.210+0000 INFO [o.n.i.d.DiagnosticsManager] dbms.connector.bolt.advertised_address=0.0.0.0:7687
2020-03-31 22:09:08.211+0000 INFO [o.n.i.d.DiagnosticsManager] dbms.connector.bolt.enabled=true
2020-03-31 22:09:08.211+0000 INFO [o.n.i.d.DiagnosticsManager] dbms.connector.http.advertised_address=0.0.0.0:7474
2020-03-31 22:09:08.211+0000 INFO [o.n.i.d.DiagnosticsManager] dbms.connector.http.enabled=true
2020-03-31 22:09:08.211+0000 INFO [o.n.i.d.DiagnosticsManager] dbms.connector.https.advertised_address=0.0.0.0:7473
2020-03-31 22:09:08.211+0000 INFO [o.n.i.d.DiagnosticsManager] dbms.connector.https.enabled=false
2020-03-31 22:09:08.211+0000 INFO [o.n.i.d.DiagnosticsManager] dbms.default_listen_address=0.0.0.0
2020-03-31 22:09:08.212+0000 INFO [o.n.i.d.DiagnosticsManager] dbms.directories.import=/var/lib/neo4j/import
2020-03-31 22:09:08.212+0000 INFO [o.n.i.d.DiagnosticsManager] dbms.directories.logs=/logs
2020-03-31 22:09:08.212+0000 INFO [o.n.i.d.DiagnosticsManager] dbms.directories.neo4j_home=/var/lib/neo4j
2020-03-31 22:09:08.212+0000 INFO [o.n.i.d.DiagnosticsManager] dbms.jvm.additional=-Djdk.tls.rejectClientInitiatedRenegotiation=true
2020-03-31 22:09:08.212+0000 INFO [o.n.i.d.DiagnosticsManager] dbms.logs.query.rotation.size=20971520
2020-03-31 22:09:08.213+0000 INFO [o.n.i.d.DiagnosticsManager] dbms.logs.query.threshold=2s
2020-03-31 22:09:08.213+0000 INFO [o.n.i.d.DiagnosticsManager] dbms.memory.pagecache.size=512M
2020-03-31 22:09:08.213+0000 INFO [o.n.i.d.DiagnosticsManager] dbms.mode=SINGLE
2020-03-31 22:09:08.213+0000 INFO [o.n.i.d.DiagnosticsManager] dbms.tx_log.rotation.retention_policy=100M size
2020-03-31 22:09:08.213+0000 INFO [o.n.i.d.DiagnosticsManager] dbms.windows_service_name=neo4j
2020-03-31 22:09:08.213+0000 INFO [o.n.i.d.DiagnosticsManager] metrics.prometheus.enabled=true
2020-03-31 22:09:08.214+0000 INFO [o.n.i.d.DiagnosticsManager] metrics.prometheus.endpoint=0.0.0.0:2004
2020-03-31 22:09:08.214+0000 INFO [o.n.i.d.DiagnosticsManager]

Nevermind, I got it. I see I specified the environment variable incorrectly (need two underscores). I use a kustomize patch and this works now (I have Ingress that routes bolt / 443 to 7687 on the service:

- op: add
  path: /spec/template/spec/containers/0/env/-
  value:
    name: NEO4J_dbms_connector_bolt_advertised__address
    value: "bolt.somewhere.com:443"
1 Like

Dear nithin,

I want to change configuration of Neo4j DB but they are all under neo4j os user. AWS configuration provide ubuntu user to connect to EC instance.
So how can you restart the server as well as change the DB configuration?
Do you need to log in to EC2 instance using neo4j user?
Do you have neo4j password default?

Thanks in advanced

On AWS there is no password for users. You connect by secure SSH keys generated by AWS.

To restart the system service, systemctl restart neo4j and if you have made any configuration changes they will be picked up

Thanks David,

I can restart neo4j service. Howerver, I can not update neo4j.conf file due to lack of permission.
Could you give me further adivce.

I also want to move /data folder to a bigger storage folder but I also have no permission to create new folder and move data files to the new folder.

image

I find my way to do this thru sudo command.
Thanks for your help.

the documentation says:

The username will be neo4j, and the password will be the instance ID.

that is not true, passw was neo4j and logging in via cypher-shell prompts to change it

@david_allen I have these exact error messages via remote access to our team's neo4J 4.1 community version installation on a microsoft Azure server when I run some queries through cypher-shell. Actually, this error appears systematically only after 2-3 hours after the query is being successfully executed.

Command:

  1. cat $CQL_FILES/xyz.cql | $CYPHERSHELL -u neo4j -p admin123 -a bolt://localhost:7687 > $CQL_LOGS/xyz.log results in the following error systematically after 2-3 hours
    Connection to the database terminated. Please ensure that your database is listening on the correct host and port and that you have compatible encryption settings both on Neo4j server and driver. Note that the default encryption setting has changed in Neo4j 4.0

  2. cat $CQL_FILES/xyz.cql | $CYPHERSHELL -u neo4j -p admin123 > $CQL_LOGS/xyz.log with default address results in the following error immedietely
    Failed to obtain connection towards WRITE server. Known routing table is: Ttl 1595167651163, currentTime 1595167381208, routers AddressSet=[], writers AddressSet=[], readers AddressSet=[], database '<default database>'

Here are the connector configurations in neo4j.conf file. Kindly let us know how to troubleshoot the above issues.

#dbms.default_listen_address=0.0.0.0

dbms.connectors.default_listen_address=0.0.0.0

# Bolt connector

dbms.connector.bolt.enabled=true

#dbms.connector.bolt.tls_level=DISABLED

dbms.connector.bolt.listen_address=0.0.0.0:7687

dbms.connector.bolt.address=0.0.0.0:7687

dbms.connector.bolt.advertised_address=A.B.C.D

# HTTP Connector. There can be zero or one HTTP connectors.

dbms.connector.http.enabled=true

dbms.connector.http.listen_address=:7474

# HTTPS Connector. There can be zero or one HTTPS connectors.

dbms.connector.https.enabled=false

#dbms.connector.https.listen_address=:7473

PS: I have just enabled the connector listen address and will post my updates when I have them.

Thanks,
Lavanya

This error usually means that the routing table is not routable from the client's perspective. This happens when the advertised address is set to something that the client machine cannot access, for example. Try connecting using only bolt (not neo4j://) and do call dbms.routing.getRoutingTable({}, 'system'); and see what the results say. If you see any addresses that can't be routed from the client's perspective that's the most likely problem.

I got the following:

[
{
"addresses": [
"A.B.C.D:7687"
],
"role": "WRITE"
},
{
"addresses": [
"A.B.C.D:7687"
],
"role": "READ"
},
{
"addresses": [
"A.B.C.D:7687"
],
"role": "ROUTE"
}
]

The above seems alright. I still see the above two errors. Do you think the real issue is with the encryption settings in the driver? Please see Which version of Neo 4J driver to install for Neo4j versions 4.0 and 4.1 - #5 by MuddyBootsCode

Lavanya

Thanks David - this has really helped me track down a similar problem I was having!