App broken: could not find a good candidate within 90 attempts at load balancing.

I’m having the exact same issue. My app in the ams region has been completely broken for the past couple of days now, as you can see from my Uptime Kuma graph. I didn’t change anything myself, and it has been running fine for months now.

I’ve tried the following:

  • fly machine restart: didn’t fix anything
  • fly ssh console: it says Error: error connecting to SSH server: connect tcp ... operation timed out
  • fly scale count 0 and then fly scale count 1: didn’t fix anything
  • fly scale count 2: didn’t fix anything, both machines are now in a broken state

Any ideas? I need this to be up and running soon.

Update: found the problem, it turns out my app is waiting for my Postgres database, but the connection between the app and the database is broken for some reason…

I still can’t ssh into my main app though. I’ve tried fly wireguard reset but it says Error: upstream service is unavailable

Update2: Seems it was an outage in the ams region which has been resolved: Can't reach database #ams #flycast