Our app currently fails to deploy with the error: “Failed due to unhealthy allocations” this also was the error when it tried to rollback. Can some ASAP look into this? I assume this is a platform issue?
Based on some more looking around this very much looks like previous reported issues where a VM might not have been shutdown correctly and because of volume the new VM can’t start up?
I definitely need someone from Fly to take a look at this as this is currently blocking dev deployments which in return blocks important prod deployments…
This looks like a delay in scheduling caused by temporary capacity issues. The host your volume is attached to had a burst of usage, when you deployed it stopped the previous VM, but then couldn’t reserve space to start the new VM. After some time, the capacity pressure cleared and Nomad was able to start a new VM.
This is a rough edge case for Nomad apps. There are two ways you may be able to workaround this problem:
Run two VMs + Volumes at all times. If you care about uptime for your application, you should run 2+ instances. If you can tolerate issues like this, one instance is fine
Run a Machine based app instead. The way machines are architected mitigates this a little. Updating an existing Machine doesn’t do the whole capacity dance, it just restarts. In this particular situation, a Machine would have updated just fine. That’s not always true, the first bullet is still the most reliable.
As an aside, when you need help with a specific app, the forums may not work well. We don’t see every thread here. For support for apps you care about, the launch plan + email support will work a lot better.