Load balance issues in GRU? (re-post)

(duplicated, someone made the last post private)

Good afternoon everyone, hope you’re all doing well.

Is there any instability currently affecting the load balancer in GRU?

My application is having trouble serving requests. Some requests go through, but shortly after that I start receiving a large number of 503 errors. There are no hard limits configured, traffic is currently low, so this doesn’t seem justified.

Additionally, the app is not restarting (I’m connected to the app with `fly ssh console` and no disconnects so far).

[PR03] could not find a good candidate within 40 attempts at load balancing. last error: [PR01] no known healthy instances found for route tcp/443. (hint: is your app shut down? is there an ongoing deployment with a volume or are you using the ‘immediate’ strategy? have your app’s instances all reached their hard limit?)

Is anyone aware of any ongoing issues?

Thanks in advance.

FYI:
Seems to be fixed after `scale count 0` and them scaled back up

:waving_hand: Is it possible to share the name of your app and approximately when this started happening?

Hey @PeterCxy, thanks for getting back to me.

I’m not very comfortable sharing the application name here due to the fly.dev subdomain.

If it helps, the machine ID was “0807560c161038” before scaling down to 0.

We identified the issue around 3 PM (Brasília time).

However, after recreating the machines, the issue seems to have been resolved.

Thanks, that’s enough for me to locate your app. It looks like from our internal logs that the problem was network issues with the host running your app’s machine, though I am not sure why it would go on for so long. In any case, my suggestion here really is that whenever possible, run multiple instances for your app so that you aren’t affected by single-node issues like this.

Of course this doesn’t explain why there was an issue here… I’ll keep looking and let you know if I find anything.

1 Like