Could not find a good candidate at load balancing (outage?)

Seems like some system is out again? My apps in Stockholm are dying after a deploy.

error.message=“could not find a good candidate within 90 attempts at load balancing” 2024-04-08T21:26:09Z proxy[3d8dd969b7d789] fra [error]request.method=“GET” request.url=“” request.id=“01HTZSJRR71Q915M8CWYWBNB54-fra” response.status=502

Im in Sydney and all my machines are out after a deploy.

Notably the fly.io dashboard is also throwing 500 errors occasionally and loading super slowly for me

Same here. App dead after deploy, getting 504 errors. Dashboard super slow and 500s all over the place.

Sorry for the inconvenience. We are currently investigating the issue.

Everything’s back to normal for me at least :slight_smile:

Thank you!

Yep the resolution worked for me :slight_smile:

Thanks for the quick turnaround team!

@kaz Left the issue for the night hoping it’d be resolved by morning. But sadly I’m still getting the same load balancing timeout and 502s. Tried re-deploying. Any ideas what I can do?

Edit: I even deleted and re-created the app. Still no success.

2024-04-09T10:10:54Z proxy[3d8dd969b7d789] arn [error]could not find a good candidate within 90 attempts at load balancing
2024-04-09T10:10:55Z proxy[3d8dd969b7d789] arn [error]could not find a good candidate within 90 attempts at load balancing

Hi @YungTarps , I’m looking into your issue. I’ll update you here when I know more.
Sorry for the inconvenience.

@aschiavo I’ve sent an email using our organisation contact email. Can you tell if there’s a fly-wide incident going on?

Pretty frustrated at this point. :frowning: Is there any information I could give you that would be of help @aschiavo?

Hi @YungTarps, sorry for the slow response. We are actively looking into your issue and will let you know when it’s resolved.
Again, thanks for your patience.

Brutal, this continues to happen and its hurting my business in a material way…i think i’ve been hit by over an hour of downtime after trying to push a deploy in the last 7 days alone. I setup multi region scaling, blue green…yet when i hit an issue during deploys and it results in my app being down, this is very disappointing.

Please…i really dont want to migrate, but i think i have to unless we see a real plan to resolve this.

Hey @YungTarps

The balancing problems you were having with your app should be fixed now.

No and in fact they’re getting worse.

a) I cannot deploy, it always times out (I emailed support, no reply)
b) It keeps taking my machines out of service leaving nothing healthy behind the load balancer

I resovled this briefly deploying into a THIRD region (ord), but clearly this isnt a solution…

W T F

Not sure if its related or not but I cannot deploy, I get my machines hanging at preparing docker image.

I’m also running into issues, and in fact am seeing this on my own dashboard:

image

Even though, disconcertingly, status.flyio.net is all green.

Yep, this is consistent with my experience. I guess this is just another outage.

I get this company wants to grow but this isnt the way to do it.

For anyone else facing this: i didnt get a reply from support yet, disapointing, but I did manage to resolve this on my own.

Steps were:

  1. Manually force destroy the deployment machine
  2. Manually force deploy the machine listed as “replacing”
  3. Deploy

This has me paranoid this will happen to my db instance, so sadly, the migration is the top of our next iteration. Peace out fly…

Thanks a ton! It is indeed fixed. I hope it was something that will now remain fixed :slight_smile:

Looks like my issue was not fixed as well. Deployments are working again. Unsure if it was related or not.