last error: unreachable worker host. the host may be unhealthy. this is a Fly issue

aesmail · January 23, 2024, 11:46am

Our app just went down suddenly with the following error:

last error: unreachable worker host. the host may be unhealthy. this is a Fly issue.

Is there any way we should do to recover from this? or do we just wait for fly to fix the issue from their side?

Thanks.

syx · January 23, 2024, 11:53am

I’m having the same issue on my Ruby Sinatra app. This is what I have in the logs:

 2024-01-23T11:40:16.795 proxy[91857932a1e238] cdg [error] could not find a good candidate within 90 attempts at load balancing. last error: could not complete HTTP request to instance: operation was canceled: request has been canceled

2024-01-23T11:40:19.782 proxy[91857932a1e238] cdg [error] could not find a good candidate within 90 attempts at load balancing. last error: unreachable worker host. the host may be unhealthy. this is a Fly issue.

Tried restarting my machine but nothing seems to work. Is there an outage going on?

kgf1980 · January 23, 2024, 11:55am

I started getting errors from my rails app connecting to PSQL (all on Fly) around 20mins ago (1130) - monitoring suggests the PSQL instance isn’t accepting connections and I can’t proxy from my shell to the PSQL instance over Fly either.

Also getting lots of errors on the Fly dashboard with monitoring disconnecting, unable to retrieve machine details and “Failed to establish connection to NATS server”

Everything is hosted in the LHR region

Update: this is what I’m getting in my Rails logs when it tries to start:

2024-01-23T12:02:32Z app[9080010c614158] lhr [info]PG::ConnectionBad: connection to server at "fdaa:2:5f84:0:1::2", port 5432 failed: server closed the connection unexpectedly
2024-01-23T12:02:32Z app[9080010c614158] lhr [info]     This probably means the server terminated abnormally
2024-01-23T12:02:32Z app[9080010c614158] lhr [info]     before or while processing the request.

iam-kevin · January 23, 2024, 12:15pm

Not sure if it’s related, but… Is anyone able to open their dashboard? Getting Error 500 for a while but every other part of fly.io (like the docs and status) seem to be working file.

andykent · January 23, 2024, 12:18pm

There is definitely something up in LHR. 4 of our 6 machines there are totally unreachable and the CLI is returning 500s when trying to scale machines there.

Other regions we are running in seem fine currently though.

cooperx86 · January 23, 2024, 12:20pm

Yes, a few apps down in LHR for the past hour. Restarting the machines involved hasn’t worked. Had a good several months with no problems though I do sometimes wonder about moving out of the LHR region for what I’m doing because it seems the touchiest?

matthewford · January 23, 2024, 12:30pm

We’re also seeing issues with LHR with 8 of our apps, across two orgs.

andykent · January 23, 2024, 1:07pm

Seems like things are coming back now.

kiancross · January 23, 2024, 1:38pm

Are things working for other people? All our apps are still currently down.

Is there any way for us to mitigate this going forward? Why aren’t any of the fly.io status pages updated to show this downtime?

aesmail · January 23, 2024, 2:14pm

I had to fly deploy again for the service to go back up.

kiancross · January 23, 2024, 5:49pm

Is there anyway we can mitigate against this other than a multi-region deployment?

mayailurus · January 23, 2024, 9:38pm

From General to Questions / Help

mayailurus · January 23, 2024, 9:38pm

Added lhr, machines

system · January 30, 2024, 9:38pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
app is currently down for maintenance	14	535	June 22, 2023
No carga mi apps	2	26	February 20, 2025
[PR03] could not find a good candidate within 21 attempts at load balancing. last error: [PU03] unreachable worker host	7	47	February 3, 2025
Error: found 1 machines that are unmanaged. Questions / Help	4	231	November 20, 2023
dxb [error] could not find a good candidate within 90 attempts at load balancing. last error: unreachable worker host. the host may be unhealthy. this is a Fly issue. JavaScript nodejs	1	360	August 31, 2023

last error: unreachable worker host. the host may be unhealthy. this is a Fly issue

Related topics