Hi there, I'm launching Neo4J Enterprise on Google Cloud Platform. At the time of this writing, the current version of neo4j is 4.0.2.
That being said, when I launch a cluster in GCP the cluster successfully is created. Great.
Then when I go to IP address associated with the cluster to login to ensure the cluster is operational, a host of issues immediately arise.
- Websocket connection issue
- Authentication issue
How has this effected us?
I've lost days debugging these issues. I dont have days to lose: I'm in a time sensitive project.
How should it be different?
I expect Neo4j Enterprise to work, right out of the box. You deploy a cluster. Upon successful cluster creation, you go to the designated cluster entry point, auth in, and bam, you're in and everything works.
Instead, the current experience is after successful cluster creation, you cannot auth in, and you lose days trying to figure things out, leaving you frustrated enough to write in this forum. lol
Steps I've taken to resolve the issue:
- The first issue the websocket issue. I noticed the error says your browser's security does not allow for this. I noticed the SSL cert was bad, so I figured they were connected. I followed @david_allen 's wonderful blog post, but its out written for v3, not v4, so the core settings @david_allen suggests are no longer relevant . Please, @david_allen, consider writing another blog post for v4. It'd help the community save hours of work, I'm sure of it. After hours of hacking at the neo4j's settings (trying to translate what David was saying for v3 to v4), I was able to get the single Neo4j VM to serve SSL cert as the default cert. I could successfully go to our neo4j sub-domain (which now had a valid cert) but found that the web socket issue persisted, FML, lol. So apparently the SSL cert is not related to the fact the Neo4j has a websocket issue. I did find that if opened port 7474 in the GCP Firewall, and then went there, I could auth in, without a websocket issue. So HTTP + auth=false => no websocket issue. HTTPS + auth=false => web socket issue still...hey...at least HTTP was kinda working, right? Turns out auth errors / write errors began at this point (second issue described below).
Summary: it looks like the web socket issue isnt related to the a valid SSL cert being present, its more like it has to do with security setting neo4j.conf. I'm not sure which security setting its related to. You can disable security and hit HTTP to get around the issue, but thats a security flaw, not a solution for an enterprise setting.
- I can now auth into the database with HTTP (I'll take the mini-victory), but these errors then arise:
- After successfully logging in:
** Cannot write to database, only a leader can write, this a follower. (note: every VM in the cluster says this. So you cannot write at all? Who the heck is the leader -- I just need to write to the DB damn it, lol)
** Eventually, it says cannot write / read, because you've tried to authenticate too many times from this client...making me unable to run any commands (note: auth=false, so why is this popping up?) - Eventually, when I go back to the Browser interface for the DB I see:
The client is unauthorized due to authentication failure
(even though auth=false and it previously just allowed me in with the same user / pw). Now I can no longer auth into the DB, FML again.
I'm about to destroy this cluster, and going to try repeat this insane set of steps to try to resolve these issues. I'd sincerely appreciate help from anyone who knows how to solve any of these issues
To the fine folks @ Neo4j:
Look, I'm no noob...I've built two graphs with billions of nodes each, operating at web scale, but I gotta say, this experience seems whack, even to a senior software engineer. lol
I expect more from Neo4j Enterprise. I expect it work out of the box. I cant tell you how many times I've torn down this cluster, recreated a new one trying to resolve these little bugs and just trying to get v4 to work. I, and I'm sure the community agrees, expect there to be no websocket issue. If there is a prerequisite step that your engineers are aware of that's needed to resolve the issue, then a patch should be released or the solution should be baked into deployment script (ie, if the websocket issue was caused by missing SSL certs, then when making a cluster, drag and drop your SSL cert here to configure your cluster).
Btw, v3.5 worked beautifully right out of the box. No issues. Its just 4.0 that has these issues. I'm losing days on this...I'm about to lose another day...
Can you answer these questions:
- How do you solve the websocket connection issue for neo4j 4.0? It's preventing developers from authing into the DB
- Are there additional steps required to make a GCP Cloud Deployment work? Why doesn't initial username / initial password work after sometime? Are you supposed to change it?
- RE: Follower vs leader issue: who am I supposed to be asking to do a write?
Thanks for being as patient with this topic as I've been with the issue
- neo4j version: 4.0.2, browser version: 4.0.5