Booting second instance of app occasionally causes Uptime Robot to report 503

I posted a question a while back about this, and found a little more info.

I have an app on two machines in different regions, with

auto_stop_machines = true
auto_start_machines = true

Occasionally (but not always), when the secondary machine starts after a period of time being stopped, Uptime Robot reports a 503 error. The time of the error seems to be right in the interim between secondary machine start and its app not quite having finished starting.

I’m guessing that it’s the ping from Uptime Robot, in these instances, that is initiating the secondary machine to start, based on whatever load balancing fly is doing.

Is there a way to prevent this behavior, so that as calls to the running machine get close to the threshold for starting another machine, they’re still routed to the running machine until the next machine is fully ready?

Hi Paul,

It’s likely your app does not define a http health check. Without this, the proxy will start forwarding connections to your app as soon as the machine has booted - if the app takes a bit of time to initialize then it won’t actually respond to the request and this might result in the 503 error from the Fly proxy.

If you define a health check though, it will only pass when your app is fully up and ready to serve traffic; the proxy takes that into account, it won’t start sending requests to the machine until the health check passes.

  • Daniel
1 Like

Thank you so much for this. That sounds very likely.

I added an [[http_service.checks]] block and will see if UptimeRobot stops complaining.

To be a little more thorough about checking whether this worked, I’d like to stop the secondary machine, then do fly machine start <id> && curl <url for app health check> in order to see if there is any gap in the health-check timing that permits a 503 to come through.

I’d need to specifically curl the secondary machine’s app instance (by machine id or by region). Is there any way to do that?

Following up: @roadmr 's suggestion seems to have done the trick. No more warnings from Uptime Robot since defining a health check. Thank you!

Still curious about the above, though. Would be interesting to know if that’s possible.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.