App resiliency feature round-up

andie · June 13, 2023, 6:08pm

Over the last few months, we’ve added some features that can help your apps be more resistant to hardware failures and outages.

We know that it can be difficult to get the big picture of what these features are and how they work, so we added a new reference doc:

Let us know if we missed anything or if we can make a topic easier to understand!

PS - Here are the Fresh Produce posts about the features that we cover in the doc:

Standby machines and creating two machines by default on first deploy
Automatically starting and stopping Machines based on traffic (to make it cheaper and easier to have redundancy)
Health check-based routing

ignoramous · June 13, 2023, 7:38pm

In our case, a scaled-down machine comes right back up (this has been the case for months now). Do we get a discount till the bug is fixed? I may be wrong but I think our bills could be 40% or so less than what we have been paying.

On failing readiness health checks, are requests routed to machines in other regions or in the same region? What if there aren’t any machines in the same region but exist in other regions?

Thanks.

ben-io · June 13, 2023, 8:02pm

They will be routed to any healthy machine. Even in other regions.

andie · June 14, 2023, 1:10am

In our case, a scaled-down machine comes right back up (this has been the case for months now). Do we get a discount till the bug is fixed? I may be wrong but I think our bills could be 40% or so less than what we have been paying.

I don’t have a timeline or any info about this issue at the moment, but we haven’t forgotten about it.

andie · June 15, 2023, 2:02pm

While we’re looking into the reasons why auto start and stop might not be working in some cases, you can set both auto_stop_machines and auto_start_machines to false in the fly.toml file.

If you’re concerned that your usage was affected before turning the feature off, then you can contact billing@fly.io and let them know what happened.

ignoramous · June 15, 2023, 4:09pm

It isn’t just auto-start / auto-stop. Machines have been waking up even without traffic or sometimes just to serve a single connection (then get taken down by our code because idle, and immediately spun back up again for just a single connection or no connection even… this keeps repeating) ever since we’ve begun using it (Oct, 2022; ref).

kurt · June 15, 2023, 6:19pm

That’s actually auto_start_machines, it just defaults to true. The auto_stop_machines logic now attempts to stop as many as it can, and concentrate requests/connections on as few as possible.

Topic		Replies	Views
Increasing Apps V2 availability Fresh Produce appsv2	20	3533	January 24, 2024
Downscaled apps keep restarting semi-regularly Questions / Help	4	392	September 24, 2023
Health check-based routing Questions / Help proxy	3	59	November 12, 2024
Issue with Proxy Routing and Health Checks After Machine Resume Questions / Help lhr , machines , autoscaling , proxy	21	133	January 11, 2025
Automatically starting/stopping Apps v2 instances Fresh Produce	50	8547	November 24, 2024

App resiliency feature round-up

Related topics