My prod apps have been down for a few hours now:
[PR03] could not find a good candidate within 21 attempts at load balancing. last error: [PU03] unreachable worker host. the host may be unhealthy. this is a Fly issue.
My prod apps have been down for a few hours now:
[PR03] could not find a good candidate within 21 attempts at load balancing. last error: [PU03] unreachable worker host. the host may be unhealthy. this is a Fly issue.
I believe that happens when the underlying host is at max capacity.
are your apps single instances? and/or all in Chicago?
if indeed the issue is the underlying host not having capacity I’d expect machines to not be able to start on it.
Perhaps you’ve already tried all of these but I’d try to:
fly scale count 2 --region ord
fly scale count 2 --region bos
but no idea about your setup (e.g. any volumes?)
Scaling to another region has indeed fixed the issue:
❯ fly scale count 2 --region bos --config hh-api.toml
App 'hh-api' is going to be scaled according to this plan:
+2 machines for group 'app' on region 'bos' of size 'shared-cpu-1x'
? Scale app hh-api? Yes
Executing scale plan
Created d89d0c495398 group:app region:bos size:shared-cpu-1x
Created d8d46b025028 group:app region:bos size:shared-cpu-1x
! WARNING: There are active host issues affecting your app. Please check `fly incidents hosts list` or visit your app in https://fly.io/dashboard
Thanks for the guidance!
glad my hypothesis helped so it was a capacity issue in ord (and probably only that specific worker host).
if being in Chicago is important to you you could now try and delete the old machines in ord and spin up new ones there (again, assuming you have no volumes attached or other considerations)
Was just doing that as you replied
Moving it back to Chicago seems to be working fine now.
Thanks again for the guidance, much appreciated.
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.