Deployments failing even in new regions

Something strange has happened with my app - it gradually dropped connections with its clients over the past half-hour or so, and then it stopped accepting connections entirely:

I noticed there was a host issue with my machines:

We are performing emergency maintenance on a host some of your apps instances are running on in SJC. Machines on this host may be unavailable until the maintenance is completed.

So I deployed to a new region to mitigate this:

fly scale count 2 --region ewr

But this failed. Health checks are failing with “connection refused”. Sounds like an internal networking issue:

As a trick to trigger a re-deploy, I like to use fly secrets deploy, but that failed as well:

Verifying if app can be safely deployed

Creating green machines
  Created machine 8731d7a06792d8 [app]
  Created machine 781337db990178 [app]
  Created machine 148e10e5f06078 [app]
  Created machine 32879220b51578 [app]

Waiting for all green machines to start
  Machine 148e10e5f06078 [app] - started
  Machine 32879220b51578 [app] - started
  Machine 781337db990178 [app] - started
  Machine 8731d7a06792d8 [app] - created
WARN error refreshing lease for machine 9080007eb94e98: failed to get lease on VM 9080007eb94e98: unauthorized

WARN error refreshing lease for machine 148e1179f9d018: failed to get lease on VM 148e1179f9d018: unauthorized

WARN error refreshing lease for machine 568365edae1dd8: failed to get lease on VM 568365edae1dd8: unauthorized

WARN error refreshing lease for machine e286d924b0e4e8: failed to get lease on VM e286d924b0e4e8: unauthorized

WARN error refreshing lease for machine 9080007eb94e98: failed to get lease on VM 9080007eb94e98: unauthorized

WARN error refreshing lease for machine 148e1179f9d018: failed to get lease on VM 148e1179f9d018: unauthorized

WARN error refreshing lease for machine 568365edae1dd8: failed to get lease on VM 568365edae1dd8: unauthorized

WARN error refreshing lease for machine e286d924b0e4e8: failed to get lease on VM e286d924b0e4e8: unauthorized

At which point I just ^C'd.

The new machines ostensibly start successfully; I can see my app’s initialization in the logs, and it appears to be listening on the correct port. But subsequently I see these two messages repeated over and over:

15:28:56 [PM05] failed to connect to machine: gave up after 15 attempts (in 8.109285926s)
15:28:56 [PR03] could not find a good candidate within 1 attempts at load balancing. last error: [PM05] failed to connect to machine: gave up after 15 attempts (in 8.085415579s)

Something is hella borked.

Ok - I think this was actually something on my end, so now I feel foolish and will delete this thread :slight_smile: (it was listening on the correct port, but the health check endpoint was timing out, so the “connection refused” error was kind of misleading)

(edit: oh, ok)

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.