Can't get NEO4J 4 running (AWS or any Cloud Deployment)

I have been working on converting my project from SQL to Neo4j and have gotten all my tests running and everything working on NEO4j Desktop on my local machine use 4.0.4, everything works fine.

I have now been trying for several weeks to get it running on a server so I can actually integrate with my application and I have run in to nothing but trouble.

First I tried to use Graphene, but they don't support 4.x yet, so I thought I would use Aura. There was nothing in the documentation about the version Aura was running and I made the assumption it was 4.x (I know, I know that's what I get for assuming). After much pain I got some of the stuff working on Aura, but a number of my queries are using 4.x only cypher syntax apparently.

So the next thing I tried was using the AWS Marketplace Cloud Formation Template, Well that doesn't work either. Every time you run it you get:

The following resource(s) failed to create: [WaitOnPasswordReset]. . Rollback requested by user.
2020-06-09 12:10:28 UTC-0500	WaitOnPasswordReset	CREATE_FAILED	WaitCondition timed out. Received 0 conditions when expecting 1

So now I have several months into this project and I don't know where to go next, as I have no way to actually start a server.

Is 4.x not production ready? What am I doing wrong?

Please note I am a developer and not an ops guy.

Sorry you're running into trouble here.

What that error message you're seeing means is that the cluster is not starting correctly, so the CloudFormation deploy process cannot succeed. To diagnose what's gone wrong, you need to look in the CloudFormation stack logs, and see events to find out what failed. If you could post some updates here on what's not working, maybe we can help further. There are a lot of reasons things could fail; for example, you could fail to create the VM because of a quota issue, or because something is misconfigured. The best way to figure this out is to just get a dump of the CloudFormation log events and see what's there, and where to go next.

Also, do please specify exactly how you're launching it, whether from the marketplace, or from templates that you downloaded

Hello,

I was running the AWS Marketplace install of a casual cluster
https://aws.amazon.com/marketplace/pp/B07D441G55?qid=1591746181777&sr=0-2&ref_=srh_res_product_title

Timestamp Logical ID Status Status reason
2020-06-09 12:17:33 UTC-0500 GTFNEO4J ROLLBACK_COMPLETE -
2020-06-09 12:17:32 UTC-0500 VPC DELETE_COMPLETE -
2020-06-09 12:17:32 UTC-0500 InternetGateway DELETE_COMPLETE -
2020-06-09 12:17:17 UTC-0500 InternetGateway DELETE_IN_PROGRESS -
2020-06-09 12:17:17 UTC-0500 VPC DELETE_IN_PROGRESS -
2020-06-09 12:17:16 UTC-0500 AttachGateway DELETE_COMPLETE -
2020-06-09 12:17:09 UTC-0500 DNSZone DELETE_COMPLETE -
2020-06-09 12:15:39 UTC-0500 Subnet1 DELETE_COMPLETE -
2020-06-09 12:15:39 UTC-0500 Subnet2 DELETE_COMPLETE -
2020-06-09 12:15:38 UTC-0500 Subnet0 DELETE_COMPLETE -
2020-06-09 12:15:27 UTC-0500 ReadOwnTags DELETE_COMPLETE -
2020-06-09 12:15:25 UTC-0500 sgNeo4jEnterprise DELETE_COMPLETE -
2020-06-09 12:15:25 UTC-0500 ReadOwnTags DELETE_IN_PROGRESS -
2020-06-09 12:15:24 UTC-0500 instProfNeo4jEnterprise DELETE_COMPLETE -
2020-06-09 12:15:24 UTC-0500 StackTokenWaitHandle DELETE_COMPLETE -
2020-06-09 12:15:23 UTC-0500 Subnet1 DELETE_IN_PROGRESS -
2020-06-09 12:15:23 UTC-0500 sgNeo4jEnterprise DELETE_IN_PROGRESS -
2020-06-09 12:15:23 UTC-0500 instProfNeo4jEnterprise DELETE_IN_PROGRESS -
2020-06-09 12:15:23 UTC-0500 StackTokenWaitHandle DELETE_IN_PROGRESS -
2020-06-09 12:15:23 UTC-0500 Subnet2 DELETE_IN_PROGRESS -
2020-06-09 12:15:23 UTC-0500 Neo4jServer1 DELETE_COMPLETE -
2020-06-09 12:15:23 UTC-0500 Neo4jServer2 DELETE_COMPLETE -
2020-06-09 12:15:22 UTC-0500 Subnet0 DELETE_IN_PROGRESS -
2020-06-09 12:15:22 UTC-0500 Neo4jServer0 DELETE_COMPLETE -
2020-06-09 12:14:36 UTC-0500 Neo4jServer1 DELETE_IN_PROGRESS -
2020-06-09 12:14:36 UTC-0500 DNSZone DELETE_IN_PROGRESS -
2020-06-09 12:14:36 UTC-0500 Neo4jServer2 DELETE_IN_PROGRESS -
2020-06-09 12:14:36 UTC-0500 Neo4jServer1DNS DELETE_COMPLETE -
2020-06-09 12:14:35 UTC-0500 Neo4jServer2DNS DELETE_COMPLETE -
2020-06-09 12:14:35 UTC-0500 Neo4jServer0 DELETE_IN_PROGRESS -
2020-06-09 12:14:35 UTC-0500 Neo4jServer0DNS DELETE_COMPLETE -
2020-06-09 12:11:18 UTC-0500 NetworkAcl DELETE_COMPLETE -
2020-06-09 12:11:18 UTC-0500 RouteTable DELETE_COMPLETE -
2020-06-09 12:11:18 UTC-0500 NetworkAcl DELETE_IN_PROGRESS -
2020-06-09 12:11:18 UTC-0500 RouteTable DELETE_IN_PROGRESS -
2020-06-09 12:11:17 UTC-0500 AttachGateway DELETE_IN_PROGRESS -
2020-06-09 12:11:17 UTC-0500 Int3NetworkAclEntry DELETE_COMPLETE -
2020-06-09 12:11:17 UTC-0500 SubnetNetworkAclAssociation1 DELETE_COMPLETE -
2020-06-09 12:11:17 UTC-0500 HTTPSIngressNetworkAclEntry DELETE_COMPLETE -
2020-06-09 12:11:17 UTC-0500 Neo4jHTTPSIngressNetworkAclEntry DELETE_COMPLETE -
2020-06-09 12:11:17 UTC-0500 InboundResponsePortsNetworkAclEntry DELETE_COMPLETE -
2020-06-09 12:11:17 UTC-0500 SSHEgressNetworkAclEntry DELETE_COMPLETE -
2020-06-09 12:11:17 UTC-0500 SSHIngressNetworkAclEntry DELETE_COMPLETE -
2020-06-09 12:11:17 UTC-0500 Neo4jHTTPSEgressNetworkAclEntry DELETE_COMPLETE -
2020-06-09 12:11:17 UTC-0500 HTTPIngressNetworkAclEntry DELETE_COMPLETE -
2020-06-09 12:11:17 UTC-0500 Int2NetworkAclEntry DELETE_COMPLETE -
2020-06-09 12:11:17 UTC-0500 Int1NetworkAclEntry DELETE_COMPLETE -
2020-06-09 12:11:17 UTC-0500 SubnetNetworkAclAssociation0 DELETE_COMPLETE -
2020-06-09 12:11:17 UTC-0500 SubnetNetworkAclAssociation2 DELETE_COMPLETE -
2020-06-09 12:11:17 UTC-0500 HTTPEgressNetworkAclEntry DELETE_COMPLETE -
2020-06-09 12:11:17 UTC-0500 BoltIngressNetworkAclEntry DELETE_COMPLETE -
2020-06-09 12:11:17 UTC-0500 SubnetRouteTableAssociation0 DELETE_COMPLETE -
2020-06-09 12:11:17 UTC-0500 HTTPSEgressNetworkAclEntry DELETE_COMPLETE -
2020-06-09 12:11:17 UTC-0500 BoltEgressNetworkAclEntry DELETE_COMPLETE -
2020-06-09 12:11:17 UTC-0500 OutBoundResponsePortsNetworkAclEntry DELETE_COMPLETE -
2020-06-09 12:11:17 UTC-0500 SubnetRouteTableAssociation1 DELETE_COMPLETE -
2020-06-09 12:11:16 UTC-0500 SubnetRouteTableAssociation2 DELETE_COMPLETE -
2020-06-09 12:11:16 UTC-0500 Route DELETE_COMPLETE -
2020-06-09 12:11:01 UTC-0500 SSHIngressNetworkAclEntry DELETE_IN_PROGRESS -
2020-06-09 12:11:01 UTC-0500 Neo4jServer2DNS DELETE_IN_PROGRESS -
2020-06-09 12:11:01 UTC-0500 SSHEgressNetworkAclEntry DELETE_IN_PROGRESS -
2020-06-09 12:11:01 UTC-0500 Neo4jHTTPSEgressNetworkAclEntry DELETE_IN_PROGRESS -
2020-06-09 12:11:01 UTC-0500 WaitOnPasswordReset DELETE_COMPLETE -
2020-06-09 12:11:01 UTC-0500 Int1NetworkAclEntry DELETE_IN_PROGRESS -
2020-06-09 12:11:01 UTC-0500 SubnetNetworkAclAssociation0 DELETE_IN_PROGRESS -
2020-06-09 12:11:01 UTC-0500 SubnetRouteTableAssociation0 DELETE_IN_PROGRESS -
2020-06-09 12:11:01 UTC-0500 HTTPSIngressNetworkAclEntry DELETE_IN_PROGRESS -
2020-06-09 12:11:01 UTC-0500 Int3NetworkAclEntry DELETE_IN_PROGRESS -
2020-06-09 12:11:01 UTC-0500 SubnetRouteTableAssociation1 DELETE_IN_PROGRESS -
2020-06-09 12:11:01 UTC-0500 Neo4jHTTPSIngressNetworkAclEntry DELETE_IN_PROGRESS -
2020-06-09 12:11:01 UTC-0500 Int2NetworkAclEntry DELETE_IN_PROGRESS -
2020-06-09 12:11:01 UTC-0500 SubnetNetworkAclAssociation1 DELETE_IN_PROGRESS -
2020-06-09 12:11:01 UTC-0500 BoltIngressNetworkAclEntry DELETE_IN_PROGRESS -
2020-06-09 12:11:01 UTC-0500 HTTPIngressNetworkAclEntry DELETE_IN_PROGRESS -
2020-06-09 12:11:01 UTC-0500 Neo4jServer1DNS DELETE_IN_PROGRESS -
2020-06-09 12:11:01 UTC-0500 HTTPEgressNetworkAclEntry DELETE_IN_PROGRESS -
2020-06-09 12:11:01 UTC-0500 WaitOnPasswordReset DELETE_IN_PROGRESS -
2020-06-09 12:11:01 UTC-0500 Neo4jServer0DNS DELETE_IN_PROGRESS -
2020-06-09 12:11:01 UTC-0500 HTTPSEgressNetworkAclEntry DELETE_IN_PROGRESS -
2020-06-09 12:11:01 UTC-0500 InboundResponsePortsNetworkAclEntry DELETE_IN_PROGRESS -
2020-06-09 12:11:01 UTC-0500 BoltEgressNetworkAclEntry DELETE_IN_PROGRESS -
2020-06-09 12:11:01 UTC-0500 SubnetNetworkAclAssociation2 DELETE_IN_PROGRESS -
2020-06-09 12:11:01 UTC-0500 OutBoundResponsePortsNetworkAclEntry DELETE_IN_PROGRESS -
2020-06-09 12:11:01 UTC-0500 SubnetRouteTableAssociation2 DELETE_IN_PROGRESS -
2020-06-09 12:11:01 UTC-0500 Route DELETE_IN_PROGRESS -
2020-06-09 12:10:29 UTC-0500 GTFNEO4J ROLLBACK_IN_PROGRESS The following resource(s) failed to create: [WaitOnPasswordReset]. . Rollback requested by user.
2020-06-09 12:10:28 UTC-0500 WaitOnPasswordReset CREATE_FAILED WaitCondition timed out. Received 0 conditions when expecting 1
2020-06-09 11:40:12 UTC-0500 Neo4jServer0DNS CREATE_COMPLETE -
2020-06-09 11:40:12 UTC-0500 Neo4jServer2DNS CREATE_COMPLETE -
2020-06-09 11:39:21 UTC-0500 Neo4jServer1DNS CREATE_COMPLETE -
2020-06-09 11:35:48 UTC-0500 Neo4jServer1DNS CREATE_IN_PROGRESS Resource creation Initiated
2020-06-09 11:35:48 UTC-0500 Neo4jServer1DNS CREATE_IN_PROGRESS -
2020-06-09 11:35:45 UTC-0500 Neo4jServer1 CREATE_COMPLETE -
2020-06-09 11:35:28 UTC-0500 Neo4jServer2DNS CREATE_IN_PROGRESS Resource creation Initiated
2020-06-09 11:35:28 UTC-0500 Neo4jServer0DNS CREATE_IN_PROGRESS Resource creation Initiated
2020-06-09 11:35:28 UTC-0500 Neo4jServer0DNS CREATE_IN_PROGRESS -
2020-06-09 11:35:28 UTC-0500 Neo4jServer2DNS CREATE_IN_PROGRESS -
2020-06-09 11:35:27 UTC-0500 WaitOnPasswordReset CREATE_IN_PROGRESS Resource creation Initiated
2020-06-09 11:35:27 UTC-0500 WaitOnPasswordReset CREATE_IN_PROGRESS -
2020-06-09 11:35:24 UTC-0500 Neo4jServer0 CREATE_COMPLETE -
2020-06-09 11:35:24 UTC-0500 Neo4jServer2 CREATE_COMPLETE -
2020-06-09 11:35:17 UTC-0500 DNSZone CREATE_COMPLETE -
2020-06-09 11:34:53 UTC-0500 Neo4jServer0 CREATE_IN_PROGRESS Resource creation Initiated
2020-06-09 11:34:52 UTC-0500 Neo4jServer2 CREATE_IN_PROGRESS Resource creation Initiated
2020-06-09 11:34:52 UTC-0500 Neo4jServer1 CREATE_IN_PROGRESS Resource creation Initiated
2020-06-09 11:34:51 UTC-0500 Neo4jServer1 CREATE_IN_PROGRESS -
2020-06-09 11:34:51 UTC-0500 Neo4jServer0 CREATE_IN_PROGRESS -
2020-06-09 11:34:51 UTC-0500 Neo4jServer2 CREATE_IN_PROGRESS -
2020-06-09 11:34:48 UTC-0500 instProfNeo4jEnterprise CREATE_COMPLETE -
2020-06-09 11:33:19 UTC-0500 SubnetRouteTableAssociation1 CREATE_COMPLETE -
2020-06-09 11:33:19 UTC-0500 Route CREATE_COMPLETE -
2020-06-09 11:33:19 UTC-0500 SubnetRouteTableAssociation0 CREATE_COMPLETE -
2020-06-09 11:33:19 UTC-0500 SubnetNetworkAclAssociation2 CREATE_COMPLETE -
2020-06-09 11:33:19 UTC-0500 SubnetRouteTableAssociation2 CREATE_COMPLETE -
2020-06-09 11:33:19 UTC-0500 SubnetNetworkAclAssociation0 CREATE_COMPLETE -
2020-06-09 11:33:18 UTC-0500 SubnetNetworkAclAssociation1 CREATE_COMPLETE -
2020-06-09 11:33:12 UTC-0500 HTTPIngressNetworkAclEntry CREATE_COMPLETE -
2020-06-09 11:33:04 UTC-0500 SubnetRouteTableAssociation1 CREATE_IN_PROGRESS Resource creation Initiated
2020-06-09 11:33:04 UTC-0500 Route CREATE_IN_PROGRESS Resource creation Initiated
2020-06-09 11:33:04 UTC-0500 SubnetRouteTableAssociation0 CREATE_IN_PROGRESS Resource creation Initiated
2020-06-09 11:33:04 UTC-0500 SubnetNetworkAclAssociation2 CREATE_IN_PROGRESS Resource creation Initiated
2020-06-09 11:33:03 UTC-0500 SubnetRouteTableAssociation2 CREATE_IN_PROGRESS Resource creation Initiated
2020-06-09 11:33:03 UTC-0500 SubnetNetworkAclAssociation0 CREATE_IN_PROGRESS Resource creation Initiated
2020-06-09 11:33:03 UTC-0500 Route CREATE_IN_PROGRESS -
2020-06-09 11:33:03 UTC-0500 SubnetRouteTableAssociation1 CREATE_IN_PROGRESS -
2020-06-09 11:33:03 UTC-0500 SubnetNetworkAclAssociation2 CREATE_IN_PROGRESS -
2020-06-09 11:33:03 UTC-0500 SubnetRouteTableAssociation0 CREATE_IN_PROGRESS -
2020-06-09 11:33:03 UTC-0500 SubnetNetworkAclAssociation1 CREATE_IN_PROGRESS Resource creation Initiated
2020-06-09 11:33:03 UTC-0500 SubnetNetworkAclAssociation0 CREATE_IN_PROGRESS -
2020-06-09 11:33:03 UTC-0500 SubnetRouteTableAssociation2 CREATE_IN_PROGRESS -
2020-06-09 11:33:03 UTC-0500 SSHIngressNetworkAclEntry CREATE_COMPLETE -
2020-06-09 11:33:03 UTC-0500 HTTPSIngressNetworkAclEntry CREATE_COMPLETE -
2020-06-09 11:33:03 UTC-0500 OutBoundResponsePortsNetworkAclEntry CREATE_COMPLETE -
2020-06-09 11:33:03 UTC-0500 InboundResponsePortsNetworkAclEntry CREATE_COMPLETE -
2020-06-09 11:33:03 UTC-0500 Neo4jHTTPSIngressNetworkAclEntry CREATE_COMPLETE -
2020-06-09 11:33:03 UTC-0500 HTTPEgressNetworkAclEntry CREATE_COMPLETE -
2020-06-09 11:33:02 UTC-0500 SubnetNetworkAclAssociation1 CREATE_IN_PROGRESS -
2020-06-09 11:33:02 UTC-0500 Int1NetworkAclEntry CREATE_COMPLETE -
2020-06-09 11:33:02 UTC-0500 BoltEgressNetworkAclEntry CREATE_COMPLETE -
2020-06-09 11:33:02 UTC-0500 Neo4jHTTPSEgressNetworkAclEntry CREATE_COMPLETE -
2020-06-09 11:33:02 UTC-0500 BoltIngressNetworkAclEntry CREATE_COMPLETE -
2020-06-09 11:33:02 UTC-0500 Int3NetworkAclEntry CREATE_COMPLETE -
2020-06-09 11:33:02 UTC-0500 HTTPSEgressNetworkAclEntry CREATE_COMPLETE -
2020-06-09 11:33:02 UTC-0500 Int2NetworkAclEntry CREATE_COMPLETE -
2020-06-09 11:33:02 UTC-0500 SSHEgressNetworkAclEntry CREATE_COMPLETE -
2020-06-09 11:33:00 UTC-0500 Subnet2 CREATE_COMPLETE -
2020-06-09 11:33:00 UTC-0500 AttachGateway CREATE_COMPLETE -
2020-06-09 11:33:00 UTC-0500 Subnet1 CREATE_COMPLETE -
2020-06-09 11:33:00 UTC-0500 Subnet0 CREATE_COMPLETE -
2020-06-09 11:32:56 UTC-0500 HTTPIngressNetworkAclEntry CREATE_IN_PROGRESS Resource creation Initiated
2020-06-09 11:32:56 UTC-0500 HTTPIngressNetworkAclEntry CREATE_IN_PROGRESS -
2020-06-09 11:32:49 UTC-0500 sgNeo4jEnterprise CREATE_COMPLETE -
2020-06-09 11:32:48 UTC-0500 sgNeo4jEnterprise CREATE_IN_PROGRESS Resource creation Initiated
2020-06-09 11:32:48 UTC-0500 instProfNeo4jEnterprise CREATE_IN_PROGRESS Resource creation Initiated
2020-06-09 11:32:47 UTC-0500 SSHIngressNetworkAclEntry CREATE_IN_PROGRESS Resource creation Initiated
2020-06-09 11:32:47 UTC-0500 OutBoundResponsePortsNetworkAclEntry CREATE_IN_PROGRESS Resource creation Initiated
2020-06-09 11:32:47 UTC-0500 HTTPSIngressNetworkAclEntry CREATE_IN_PROGRESS Resource creation Initiated
2020-06-09 11:32:47 UTC-0500 HTTPEgressNetworkAclEntry CREATE_IN_PROGRESS Resource creation Initiated
2020-06-09 11:32:47 UTC-0500 Neo4jHTTPSIngressNetworkAclEntry CREATE_IN_PROGRESS Resource creation Initiated
2020-06-09 11:32:47 UTC-0500 instProfNeo4jEnterprise CREATE_IN_PROGRESS -
2020-06-09 11:32:47 UTC-0500 InboundResponsePortsNetworkAclEntry CREATE_IN_PROGRESS Resource creation Initiated
2020-06-09 11:32:47 UTC-0500 Int1NetworkAclEntry CREATE_IN_PROGRESS Resource creation Initiated
2020-06-09 11:32:47 UTC-0500 BoltEgressNetworkAclEntry CREATE_IN_PROGRESS Resource creation Initiated
2020-06-09 11:32:47 UTC-0500 Neo4jHTTPSEgressNetworkAclEntry CREATE_IN_PROGRESS Resource creation Initiated
2020-06-09 11:32:47 UTC-0500 BoltIngressNetworkAclEntry CREATE_IN_PROGRESS Resource creation Initiated
2020-06-09 11:32:47 UTC-0500 Int2NetworkAclEntry CREATE_IN_PROGRESS Resource creation Initiated
2020-06-09 11:32:47 UTC-0500 Int3NetworkAclEntry CREATE_IN_PROGRESS Resource creation Initiated
2020-06-09 11:32:47 UTC-0500 HTTPSEgressNetworkAclEntry CREATE_IN_PROGRESS Resource creation Initiated
2020-06-09 11:32:47 UTC-0500 SSHEgressNetworkAclEntry CREATE_IN_PROGRESS Resource creation Initiated
2020-06-09 11:32:47 UTC-0500 SSHIngressNetworkAclEntry CREATE_IN_PROGRESS -
2020-06-09 11:32:47 UTC-0500 HTTPSIngressNetworkAclEntry CREATE_IN_PROGRESS -
2020-06-09 11:32:47 UTC-0500 OutBoundResponsePortsNetworkAclEntry CREATE_IN_PROGRESS -
2020-06-09 11:32:47 UTC-0500 HTTPEgressNetworkAclEntry CREATE_IN_PROGRESS -
2020-06-09 11:32:47 UTC-0500 Neo4jHTTPSIngressNetworkAclEntry CREATE_IN_PROGRESS -
2020-06-09 11:32:47 UTC-0500 Int1NetworkAclEntry CREATE_IN_PROGRESS -
2020-06-09 11:32:47 UTC-0500 InboundResponsePortsNetworkAclEntry CREATE_IN_PROGRESS -
2020-06-09 11:32:47 UTC-0500 Neo4jHTTPSEgressNetworkAclEntry CREATE_IN_PROGRESS -
2020-06-09 11:32:46 UTC-0500 BoltIngressNetworkAclEntry CREATE_IN_PROGRESS -
2020-06-09 11:32:46 UTC-0500 BoltEgressNetworkAclEntry CREATE_IN_PROGRESS -
2020-06-09 11:32:46 UTC-0500 HTTPSEgressNetworkAclEntry CREATE_IN_PROGRESS -
2020-06-09 11:32:46 UTC-0500 Int2NetworkAclEntry CREATE_IN_PROGRESS -
2020-06-09 11:32:46 UTC-0500 Int3NetworkAclEntry CREATE_IN_PROGRESS -
2020-06-09 11:32:46 UTC-0500 SSHEgressNetworkAclEntry CREATE_IN_PROGRESS -
2020-06-09 11:32:45 UTC-0500 ReadOwnTags CREATE_COMPLETE -
2020-06-09 11:32:45 UTC-0500 AttachGateway CREATE_IN_PROGRESS Resource creation Initiated
2020-06-09 11:32:44 UTC-0500 NetworkAcl CREATE_COMPLETE -
2020-06-09 11:32:44 UTC-0500 RouteTable CREATE_COMPLETE -
2020-06-09 11:32:44 UTC-0500 DNSZone CREATE_IN_PROGRESS Resource creation Initiated
2020-06-09 11:32:44 UTC-0500 AttachGateway CREATE_IN_PROGRESS -
2020-06-09 11:32:44 UTC-0500 Subnet0 CREATE_IN_PROGRESS Resource creation Initiated
2020-06-09 11:32:44 UTC-0500 Subnet2 CREATE_IN_PROGRESS Resource creation Initiated
2020-06-09 11:32:44 UTC-0500 Subnet1 CREATE_IN_PROGRESS Resource creation Initiated
2020-06-09 11:32:44 UTC-0500 NetworkAcl CREATE_IN_PROGRESS Resource creation Initiated
2020-06-09 11:32:44 UTC-0500 RouteTable CREATE_IN_PROGRESS Resource creation Initiated
2020-06-09 11:32:44 UTC-0500 DNSZone CREATE_IN_PROGRESS -
2020-06-09 11:32:44 UTC-0500 Subnet2 CREATE_IN_PROGRESS -
2020-06-09 11:32:43 UTC-0500 Subnet0 CREATE_IN_PROGRESS -
2020-06-09 11:32:43 UTC-0500 sgNeo4jEnterprise CREATE_IN_PROGRESS -
2020-06-09 11:32:43 UTC-0500 Subnet1 CREATE_IN_PROGRESS -
2020-06-09 11:32:43 UTC-0500 NetworkAcl CREATE_IN_PROGRESS -

That's the output in the Cloud Formation Events. I didn't see any related logs in CloudWatch.

I don't see any more useful details anywhere.

@bledi.feshti1 can you have a look at this? This interaction pattern shows that all of the components got created correctly (no errors) but the wait on cluster formation failed, meaning that the cluster did not properly form before the timeout. There are two possible causes of this:

  • System misconfiguration (which is unlikely as these templates have been through previous testing)
  • Race condition (cluster does not form quickly enough) and that the timeout that signals cluster failure is happening seconds or a minute before the cluster would have formed.

My best bet is that it is the second condition, and that if you increase the timeout for the CF deploy, that it'll take a minute or two longer but the cluster will form. But this needs checking.

Did you figure out how to fix this or work around it?

I'm trying to setup a Enterprise Causal Cluster in Amazon, using the stack template.
I have tried creating the stack multiple times with 3.5.15, 4.0.3, 4.0.4 and 4.0.5 (multiple times), with a timeout of 60 minutes.

Each time it looks like everything is created, but it hangs on the last step until posting a final event (
WaitOnPasswordReset
CREATE_FAILED
WaitCondition timed out. Received 0 conditions when expecting 1) and doing a rollback, exactly like the log above.

If this is a different issue, I apologize for the hijack, but o.w. I would appreciate a heads-up with a template that actually works, if you managed to get one.

Thank you in advance.

Can you please indicate the command or method you're using to launch, and what region you're launching into? We did look at this, and we weren't able to replicate the problem. It's true that the deploy will fail if the timeout is too short. And it's also true that the 4.0 series takes a bit more time than 3.5 to finish the cluster formation. But in either case, 10 minutes maximum should be more than enough, and if it's failing after 60 minutes something else is going on.

To look into this we'd need more information on exactly how the deploy is happening, and whether or not you have other network rules in place in your account. What's failing is the cluster formation step -- this requires that the machines in the cluster be able to reach one another on the right ports. So (I'm a bit guessing blindly) but there may be a network situation that is preventing them from contacting one another.

All I was doing was picking Casual Cluster 4.0.5 from the AWS Marketplace and selecting the US-EAST2 region to deploy it on with t2 Medium machines (3) everything else was the defaults and it fails every time I tried. I have never gotten it to work and am currently using a single Enterprise Node from an AMI, I do not yet have a plan of how I am going to run in production when I get all my bugs figured out.

And the same here, except us-east-1 and default machines (r4.large x 3).

Product code: 1emg6yskh0jf81czgzfadiu9w, if that helps.

Settings I have changed are:

  • Version
  • Stack name
  • SSH Key name (I could log into the machines with it, when disabling the rollback)
  • IP whitelist (0.0.0.0/0 for testing)
  • Password (changeme for testing)
  • Sometimes also the wait timeout and/or rollback on fail

And then next, accept, next. Creation started, runs for quite a while, blows up and rolls back.

On a side note, it just struck me, that you could have different password constraints in the system compared to the template, so I am retrying with a more complex password.

Edit:
"Complex" password also failed, same symptoms.

As for the second part of your question, in my case, I am running as my developer role on a company account (IAM account?) that also serves the rest of our infrastructure. Neo4j creates it's own VPC(Neo4jVPC-{stack name}), subnet (same naming scheme) and security group ({stack name}-sgNeo4jEnterprise-1{12 char HEX}), as far as I can tell.

I can log into the machines via the public ip using my SSH key and jump between the machines using the private IPs (again using ssh and the key).

If you need anything else, please be specific; I'm pretty new to most thing AWS myself.

Again, thank you in advance

Give it 10 minutes or so to form. It should by then.

Then, SSH into the machines, and grab a copy of /var/lib/neo4j/logs/debug.log from all 3 VMs. Those logs are going to tell what's going on. Scan them for errors or exceptions, in particular anything that mentions akka. Report back with those errors, and we can diagnose

Node0.txt (21.8 KB) Node1.txt (55.3 KB) Node2.txt (14.9 KB)

I have truncated it a bit for brevity. The "Failed to load" log lines in Node1.txt are on all three instances.

Private IPs:
Node0: 10.0.0.46
Node1: 10.0.1.222
Node2: 10.0.2.84

Public IPs:
Node0: 34.201.44.16
Node1: 54.92.147.57
Node2: 34.234.73.125

Nodes 0 and 2 seem to know the IPs for the others (in the log). Node 1 doesn't or isn't telling that it does. It does, however, log that it is available on 172.31.45.91, which is not a part of this cluster.

And it looks like node0 is trying to start the server while it is already running.

Do you need anything else?

I see two different errors in your logs that are directly preventing cluster formation. It looks like you have some customization in either your configuration or network settings. I can't speak to how to fix this, because I'm not sure what your CF looks like, but here is one issue on node0:

2020-06-23 15:57:05.397+0000 ERROR [a.i.TcpListener] Bind failed for TCP channel on endpoint [/10.0.0.46:5000] Address already in use
java.net.BindException: Address already in use

Without port 5000 open for cluster traffic you can't form a cluster. Best guess: in your neo4j configuration, somehow you're setting some other process to use port 5000 (or possibly you're binding a different neo4j service to port 5000). Either way, this is a clear problem with node0.

On node1:

2020-06-23 15:56:51.008+0000 ERROR [a.e.DummyClassForStringSources] Outbound message stream to [akka://cc-discovery-actor-system@node0.neo4j:5000] failed. Restarting it. Tcp command [Connect(node0.neo4j:5000,None,List(),Some(10000 milliseconds),true)] failed because of java.net.UnknownHostException: node0.neo4j Tcp command [Connect(node0.neo4j:5000,None,List(),Some(10000 milliseconds),true)] failed because of java.net.UnknownHostException: node0.neo4j
akka.stream.StreamTcpException: Tcp command [Connect(node0.neo4j:5000,None,List(),Some(10000 milliseconds),true)] failed because of java.net.UnknownHostException: node0.neo4j
Caused by: java.net.UnknownHostException: node0.neo4j

In this case, node1 can't reach node0.neo4j, because its DNS doesn't even resolve. In our cloudformation templates, there are provisions for these private internal DNS addresses (node0.neo4j, node1.neo4j, node2.neo4j). I would look into any custom configuration you have here which could interfere with these DNS names.

It looks like overall your cluster has no chance to form correctly and succeed, due to a combination of port conflicts & network/DNS misconfiguration.

Thank you for the feedback.

I agree with your assessments, however, I'm unsure how to actually do anything about it. Node0 is created by the cloudfront template; I'm not sure how anything in our setup would inject itself into the container and start running on port 5000 on that specific machine during provisioning? Isn't it more likely that something isn't shut down properly during the initial stopping of Neo4j during setup?
There is nowhere in the CF template that I picked any port, let alone 5000.

As for the other part, the machines are running in their own VPC, and they can reach each other using ssh. I can't say that we do not have any DNS setting that disallows them from registering themselves, however. Would that be enough to prevent the cluster from forming?

All I can tell you is which parts in the CF provide for the things that appear to be broken in your install. This is the bit that creates the DNS Zones, you will see something similar to it repeated in the CF with template substitutions

{
    "Type": "AWS::Route53::RecordSet",
    "Condition" : "{{condition}}",
    "DependsOn" : "DNSZone",
    "Properties": {
        "HostedZoneId": { "Ref" : "DNSZone" },
        "Comment" : "DNS names for neo4j {{groupName}} {{i}}.",  
        "Name" : "{{groupName}}{{i}}.{{INTERNAL_DNS_TLD}}.",
        "Type" : "A",
        "TTL" : "900",
        "ResourceRecords" : [
            {# Map DNS to **private IP** not PublicIp, this because
                # it's inside the VPC and cluster coord traffic isn't allowed
                # outside anyway.
                #}
            { "Fn::GetAtt" : [ "Neo4jServer{{i}}", "PrivateIp" ] }
        ]
    }
}

You can look for corresponding records in AWS.

Similarly, you can inspect the open ports on each of the VMs. You should see 5000, 6000, and 7000 open to the internal VPC only, and 7473, 7687 open outside of the VPC.

Sorry -- unfortunately with customizations and customer tenancies in AWS there are a lot of ways these deploys can go wrong in ways that we can't see because we can't see your AWS setup. Between these logs and those bits, you ought to be able to go through your configuration and see what's missing / wrong. As I said, we tried to replicate this on our side and couldn't. That means that either a modification was made to the CF template that interfered with something, or there may be something about your AWS tenancy / policy / quotas that interferes, or some exotic third possibility I just don't know.

I realize the complexity involved can be quite high. I have tried reducing that by running the stack creation on my own personal account instead, to see if it was different. Note that I am still provisioning in N. Virginia (us-east-1) and the remaining choices are identical (only ran the test with 4.0.5, though).

This time, I managed to get the cluster up and running, and I could connect to it on :7473 through my browser, add a node and find it again using Cypher. So it seems you are right that some setting is preventing the cluster from forming properly on our company account.

However, the WaitOnPasswordReset step never completed, so after 30 minutes (roughly, my chosen timeout), everything rolled back again so it also seems that something in that script is incompatible with my other account (that I have only used for EC2 containers previously).

I am also running into the same issue as mentioned by @mike2 and @thomas4 when trying to run the Cloud Formation for Neo4j Enterprise Causal Cluster using 3 nodes on r4.large instances. On the "WaitOnPasswordReset" step of the formation, the following error occurs:

WaitCondition timed out. Received 0 conditions when expecting 1

After that, the entire Stack rolls back and deletes all resources.

All resources seem to be created before the wait on password reset stage.

Any help would be greatly appreciated as this stack is EXACTLY what we need for our APIs and we are no longer at the phase where we can sustain a single neo4j instance with routine snapshots.

Thanks!

Same error with AWS @dayel, @mike2, @thomas4, could you solved it?

I never found a solution, no.
We ended up using a different product, since we do not need the clustered version yet and I can't really justify spending any more company resources on this problem.
But I would still like to hear if anyone else finds a solution or the problem goes away.

A bug was recently found in the AWS marketplace template that mostly affected those people launching clusters with read replicas. A fix is underway by @bledi.feshti and I think has been completed; we're mostly waiting for the new version to be listed by the AWS marketplace. This listing can take up to 10 days to get approved, but should be ready soon.

1 Like