We fixed a nasty deployment bug for apps with health checks

pavel · April 26, 2024, 2:14pm

A short while ago we introduced a bug that caused (under very specific but racy circumstances) machines with health checks to disappear from fly-proxy roster shortly after deployment and no longer being considered by the proxy during routing.

The symptoms of the bug were: not all of the machines in an app served requests, or, if your app had only one machine, the app suddenly became unavailable after deploy until the machine is restarted.

The bug affected only newly created machines or machines with newly added health checks. The health check also had to be configured with very small grace_period to trigger the bug.

Any change to the machine (start → stop, stop → start) or health check status (critical → passing) automagically “fixed” the problem and made the machine available for routing again.

The bug mostly affected blue/green deployments because:

they require health checks
they always create new machines

Well, this should be fixed now!

Topic		Replies	Views
App failing health check doesn't receive another health check Questions / Help	2	14	April 7, 2025
Fly app does not rollback after a failed deploy	2	453	August 25, 2023
We now have custom Machine based health checks! Fresh Produce	2	1164	May 29, 2024
I cannot deploy my project, because my project fails health checks rails , machines	2	38	April 2, 2025
Health checks on Machines Questions / Help wishlist	4	1272	May 4, 2024

We fixed a nasty deployment bug for apps with health checks

Related topics