New blue-green deployments failing - machines never passing healthchecks

wjordan · January 16, 2025, 12:29am

Hey folks, just wanted to add some details to provide a bit more clarity to this thread:

We identified a specific platform issue that had been causing healthchecks in blue-green deployments to fail, which we tracked down to a change that we had been slowly rolling out to a handful of regions over the last couple of days (specifically: scl , mia , bom , gig , bog , eze , gdl , yul , otp, and a small portion of sin at 2025-01-13T21:30:00Z, [edit: additionally followed by ewr, lax, lhr, hkg, jnb, arn, atl, bos, cdg, den, dfw at 2025-01-14T21:30:00Z]). We reverted the change in these regions and confirmed this fixed the issue. Any other regions were not affected by this issue.

That said, I highly suspect that ~~the majority of reported~~ [edit: any remaining] issues in this thread are actually more directly related to the CPU Quotas Update that we initially announced last October and that we just completed rolling out yesterday. If your app is running on shared instances and uses a heavy amount of CPU on startup (more than the 1/16th or 6.25% of a core we allocate to each shared vCPU), your app may be taking longer to boot as a result of limited performance, which may impact deployments with health checks (since they may take longer to pass). You may need to adjust your health-check or deploy-wait timeouts, and/or scale up your instances to match your app’s workload.

Hope this extra info is helpful and unblocks those still experiencing various issues.

Topic		Replies	Views
health-check failing during blue-green deployment elixir	3	270	May 9, 2024
Can't seem to deploy on fly since yesterday Questions / Help	1	25	January 22, 2025
When a bluegreen deploy fails I end up with extra machines Build debugging machines	11	136	July 27, 2024
BlueGreen Deploys Fail with Timeout	1	40	January 16, 2025
Auto stopping machines won't start after bluegreen deployment, even though deployment was successful Questions / Help autoscaling , proxy	6	26	May 29, 2025

New blue-green deployments failing - machines never passing healthchecks

Related topics