Deploys stuck in `pending`

Hi! On the hobby plan, so starting a thread here instead of emailing. Very happy with the service so far, but we’re a small non-profit and so far only running a small service on the platform.

We don’t seem to be able to deploy new app container releases.

One app container was apparently shut down by the platform yesterday, but never recovered or restarted. Deploying it again doesn’t seem to do anything, it’s in pending and the release shows in the Activity sidebar, but on the Monitoring page only the logs from the last working release shows - and now container is shown from the old one.

Tried deploying a new release of another app we have running, and it also just increments the release number without a new container starting. The old container is still running.

It’s normal for a deploy with no changes to increment the version number and not restart.

fly status --all may show you why the previous VM shut down. If a VM fails repeatedly, there’s a backoff interval before we replace it again. You can see specifics by running fly vm status <id>.

Thanks for your reply!

I’ve made sure there are changes each deploy. I did manage to catch something though.

I deployed and then watched the deployment (6debe594-93bf-defb-f57a-9f6209a08c82) status stay in pending for 20 minutes before an instance appeared (ce8a1435). After about 5 minutes of the deployment running, the deployment status and instance both disappeared. Since I got the instance ID this time I could run vm status and it says:

Events
TIMESTAMP               TYPE            MESSAGE                                            
2023-02-03T17:59:28Z    Received        Task received by client                            
2023-02-03T17:59:28Z    Task Setup      Building Task Directory                            
2023-02-03T18:04:28Z    Alloc Unhealthy Task not running by deadline                       
2023-02-03T18:04:34Z    Killing         Sent interrupt. Waiting 5s before force killing    
2023-02-03T18:04:37Z    Template        Missing: vault.read(apps/data/93705/volume_encryption_key)

This is exactly what’s happening to me as well, but I have shipped code changes every time. I see the pending apps for several hours, before finally trying to fix myself and getting it to work by scaling down and back up.

Instance
  ID            = 41ca3525

TIMESTAMP            TYPE            MESSAGE
2023-02-03T18:21:15Z Received        Task received by client
2023-02-03T18:21:15Z Task Setup      Building Task Directory
2023-02-03T18:26:15Z Alloc Unhealthy Task not running by deadline

Detached the disk and the app was successfully launched outside fra. Lucky this app doesn’t actually persist anything interesting.

Would very much appreciate more insight into this, was Vault degraded? Affected by the routing issues? The encryption keys lost? The volume corrupt?