Booting second instance of app occasionally causes Uptime Robot to report 503

paulrudy · June 20, 2024, 10:20am

I posted a question a while back about this, and found a little more info.

I have an app on two machines in different regions, with

auto_stop_machines = true
auto_start_machines = true

Occasionally (but not always), when the secondary machine starts after a period of time being stopped, Uptime Robot reports a 503 error. The time of the error seems to be right in the interim between secondary machine start and its app not quite having finished starting.

I’m guessing that it’s the ping from Uptime Robot, in these instances, that is initiating the secondary machine to start, based on whatever load balancing fly is doing.

Is there a way to prevent this behavior, so that as calls to the running machine get close to the threshold for starting another machine, they’re still routed to the running machine until the next machine is fully ready?

roadmr · June 20, 2024, 5:34pm

Hi Paul,

It’s likely your app does not define a http health check. Without this, the proxy will start forwarding connections to your app as soon as the machine has booted - if the app takes a bit of time to initialize then it won’t actually respond to the request and this might result in the 503 error from the Fly proxy.

If you define a health check though, it will only pass when your app is fully up and ready to serve traffic; the proxy takes that into account, it won’t start sending requests to the machine until the health check passes.

Daniel

paulrudy · June 20, 2024, 7:57pm

Thank you so much for this. That sounds very likely.

I added an [[http_service.checks]] block and will see if UptimeRobot stops complaining.

To be a little more thorough about checking whether this worked, I’d like to stop the secondary machine, then do fly machine start <id> && curl <url for app health check> in order to see if there is any gap in the health-check timing that permits a 503 to come through.

I’d need to specifically curl the secondary machine’s app instance (by machine id or by region). Is there any way to do that?

paulrudy · June 22, 2024, 12:58am

Following up: @roadmr 's suggestion seems to have done the trick. No more warnings from Uptime Robot since defining a health check. Thank you!

Still curious about the above, though. Would be interesting to know if that’s possible.

system · June 29, 2024, 12:59am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
502 with `auto_stop_machines = true`	3	199	August 10, 2023
Starting suspended machines on deploy	4	54	March 25, 2025
Machine starting and Health Checks Questions / Help autoscaling	1	27	January 25, 2025
why my chatbot is suspended	5	662	July 8, 2023
Application machine started without me doing so autoscaling	3	21	March 19, 2025

Booting second instance of app occasionally causes Uptime Robot to report 503

Related topics