[PR03] could not find a good candidate within 21 attempts at load balancing. last error: [PU03] unreachable worker host

ben_h · January 27, 2025, 11:02am

My prod apps have been down for a few hours now:

[PR03] could not find a good candidate within 21 attempts at load balancing. last error: [PU03] unreachable worker host. the host may be unhealthy. this is a Fly issue.

mabis · January 27, 2025, 1:20pm

I believe that happens when the underlying host is at max capacity.

are your apps single instances? and/or all in Chicago?

ben_h · January 27, 2025, 1:21pm

Many thanks. They are all single instance in ORD. I am unable to start/stop the machine also.

mabis · January 27, 2025, 1:25pm

if indeed the issue is the underlying host not having capacity I’d expect machines to not be able to start on it.

Perhaps you’ve already tried all of these but I’d try to:

scale up the app and see if the new, additional machine is assigned to a host with capacity fly scale count 2 --region ord
if trying to do so in the same region (ord) fails I’d try with another region (I picked bos as another random US region), e.g. fly scale count 2 --region bos

but no idea about your setup (e.g. any volumes?)

ben_h · January 27, 2025, 1:31pm

Scaling to another region has indeed fixed the issue:

❯ fly scale count 2 --region bos --config hh-api.toml
App 'hh-api' is going to be scaled according to this plan:
  +2 machines for group 'app' on region 'bos' of size 'shared-cpu-1x'
? Scale app hh-api? Yes
Executing scale plan
  Created d89d0c495398 group:app region:bos size:shared-cpu-1x
  Created d8d46b025028 group:app region:bos size:shared-cpu-1x
! WARNING: There are active host issues affecting your app. Please check `fly incidents hosts list` or visit your app in https://fly.io/dashboard

Thanks for the guidance!

mabis · January 27, 2025, 1:37pm

glad my hypothesis helped so it was a capacity issue in ord (and probably only that specific worker host).

if being in Chicago is important to you you could now try and delete the old machines in ord and spin up new ones there (again, assuming you have no volumes attached or other considerations)

ben_h · January 27, 2025, 1:40pm

Was just doing that as you replied

Moving it back to Chicago seems to be working fine now.

Thanks again for the guidance, much appreciated.

system · February 3, 2025, 1:40pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
last error: unreachable worker host. the host may be unhealthy. this is a Fly issue Questions / Help lhr , machines	13	376	January 30, 2024
App broken: could not find a good candidate within 90 attempts at load balancing.	5	2838	September 19, 2023
Weird machine scaling behavior when instance went down	1	211	October 2, 2023
App reachable on client but not reachable via other fly apps	3	227	November 24, 2023
"could not find a good candidate within 90 attempts at load balancing" though app can be opened via SSH proxy	3	954	January 17, 2024

[PR03] could not find a good candidate within 21 attempts at load balancing. last error: [PU03] unreachable worker host

Related topics