I’m having trouble deploying currently using flyctl. I’m at the latest version. When I try to deploy, I’m seeing something like:
==> Creating release
Release v7078 created
You can detach the terminal anytime without stopping the deployment
Monitoring Deployment
30 desired, 18 placed, 8 healthy, 1 unhealthy [health checks: 18 total, 18 passing]
v7078 failed - Failed due to unhealthy allocations - rolling back to job version 7077
23 desired, 5 placed, 0 healthy, 1 unhealthy [health checks: 2 total, 1 passing]
v7079 failed - Failed due to unhealthy allocations - not rolling back to stable job version 7079 as current job has same specification
Failed Instances
I’m not able to see why it is showing as unhealthy, so it’s difficult to fix.
The strange behaviour I’m seeing (via fly status --watch) is that the deploy will fail, and then the number of instances running (target is 30) will slowly drain down to 3 or 4, and then I deploy again they’ll shoot back up to ~30 of the previous version which will start to be replaced with the new version… which fails, and then they start to drop off again.
Your help would be greatly appreciated, as right now the app/site is down.
Update: it’s now back up, but running version 7078, which showed as failed to deploy. Not sure if this is a bug with my code or something to do with the deploy itself.
These were likely transient failures on our end. We’ve had heisenbugs causing VM failures on busy, global apps this week. Especially running in Chennai and Sydney.
You can add this to your config to prevent rollbacks, which will help some:
[experimental]
auto_rollback = false
If a random VM failure happens with that set, the deploy will stop but be staged. You can run fly vm stop <id> for any older instances that are still running when that happens.
Ah, thanks for the heads-up. I was planning to do a number of deploys this week to test out various tuneable settings, but I’ll hold off. Sorry to ask the annoying question, but do you have any idea when things might get a little more stable on your side?
Oh, FYI, I tried fly scale count=20 down from 30, and it took the app offline again (release 7080). A redeploy from my side (7081) seems to have brought everything back up. This also seems to be the first “successful” release I’ve attempted: