deploys failing due to "unhealthy allocations"

trying to deploy a simple web app, and it keeps failing with an obscure error message:

 3 desired, 2 placed, 1 healthy, 1 unhealthy [health checks: 1 total, 1 passing]
--> v11 failed - Failed due to unhealthy allocations - rolling back to job version 10 and deploying as v12

watching flyctl status at the same time, i see something like this:

  Description = Deployment is running
  Instances   = 3 desired, 2 placed, 1 healthy, 0 unhealthy

Instances
ID              PROCESS VERSION REGION  DESIRED STATUS  HEALTH CHECKS           RESTARTS        CREATED
12e25bfc        app     13 ⇡    lhr     run     pending                         0               5m34s ago
e86671e2        app     13 ⇡    sin     run     running 1 total, 1 passing      0               5m56s ago
c39db057        app     12      lax     run     running 1 total, 1 passing      0               2022-10-06T22:39:47Z

the instance in lhr never even seems to begin running the health checks. there’s no reason this static web app would pass health checks in sin and fail in lhr.

it eventually succeeded! FWIW this is not my first experience with fly being flaky during deploys, but when i bring it up to support they always reassure me that it’s a configuration or a UI issue and that fundamentally deploys are fine. i’m not totally sure – i’m curious what’s in the works in fly’s deploy process, because i’m having a hard time trusting it.

This is likely due to registry issues we’re troubleshooting: Fly.io Status - Intermittent image pull failures from registry.fly.io

You can usually tell if it’s us or you when you debug a failed deploy. You need to run fly status --all to find the failing instance ID, then fly vm status <id>. The vm status command will give you details about what actually failed.

When it’s our infrastructure failing, you’ll typically see weird errors about “failed to pull image”. The majority of deploy failures are not this, they’re usually health checks failing, an app crashing, or health checks taking too long to start passing.

@kurt
Hello,

I had the same issue yesterday and it keeps going on.

Here my logs :slight_smile:

BInstance
ID PROCESS VERSION REGION DESIRED STATUS HEALTH CHECKS RESTARTS CREATED
db104ab0 app 55 cdg run running 1 total, 1 critical2 57s ago
Recent Events
TIMESTAMP TYPE MESSAGE
2022-10-26T06:50:05Z Received Task received by client
2022-10-26T06:50:05Z Task Setup Building Task Directory
2022-10-26T06:50:44Z Started Task started by client
2022-10-26T06:50:48Z Terminated Exit Code: 100
2022-10-26T06:50:48Z Restarting Task restarting in 1.128032948s
2022-10-26T06:50:55Z Started Task started by client
2022-10-26T06:50:59Z Terminated Exit Code: 100
2022-10-26T06:50:59Z Restarting Task restarting in 1.026185638s
2022-10-26T06:51:06Z Started Task started by client
2022-10-26T06:50:44Z [info]Starting virtual machine
2022-10-26T06:50:44Z [info]Starting init (commit: 249766e)…
2022-10-26T06:50:44Z [info]Preparing to run: /entrypoint as root
2022-10-26T06:50:44Z [info]2022/10/26 06:50:44 listening on [fdaa:0:b967:a7b:ae02:db10:4ab0:2]:22 (DNS: [fdaa::3]:53)
2022-10-26T06:50:46Z [info]s6-overlay-suexec: fatal: can only run as pid 1
2022-10-26T06:50:46Z [info]Starting clean up.
2022-10-26T06:50:53Z [info]Starting instance
2022-10-26T06:50:54Z [info]Configuring virtual machine
2022-10-26T06:50:54Z [info]Pulling container image
2022-10-26T06:50:54Z [info]Unpacking image
2022-10-26T06:50:54Z [info]Preparing kernel init
2022-10-26T06:50:55Z [info]Configuring firecracker
2022-10-26T06:50:55Z [info]Starting virtual machine
2022-10-26T06:50:55Z [info]Starting init (commit: 249766e)…
2022-10-26T06:50:55Z [info]Preparing to run: /entrypoint as root
2022-10-26T06:50:55Z [info]2022/10/26 06:50:55 listening on [fdaa:0:b967:a7b:ae02:db10:4ab0:2]:22 (DNS: [fdaa::3]:53)
2022-10-26T06:50:57Z [info]s6-overlay-suexec: fatal: can only run as pid 1
2022-10-26T06:50:57Z [info]Starting clean up.
2022-10-26T06:51:04Z [info]Starting instance
2022-10-26T06:51:04Z [info]Configuring virtual machine
2022-10-26T06:51:04Z [info]Pulling container image
2022-10-26T06:51:05Z [info]Unpacking image
2022-10-26T06:51:05Z [info]Preparing kernel init
2022-10-26T06:51:05Z [info]Configuring firecracker
2022-10-26T06:51:06Z [info]Starting virtual machine
2022-10-26T06:51:06Z [info]Starting init (commit: 249766e)…
2022-10-26T06:51:06Z [info]Preparing to run: /entrypoint as root
2022-10-26T06:51:06Z [info]2022/10/26 06:51:06 listening on [fdaa:0:b967:a7b:ae02:db10:4ab0:2]:22 (DNS: [fdaa::3]:53)
2022-10-26T06:51:07Z [info]s6-overlay-suexec: fatal: can only run as pid 1
2022-10-26T06:51:08Z [info]Starting clean up.
→ v55 failed - Failed due to unhealthy allocations - rolling back to job version 54 and deploying as v56

I got this s6-overlay-suexec: fatal: can only run as pid 1 error since yesterday and without any changes in fly.toml or Dockerfile.

Ok I fixed it.
Found it was related to this error : 📣 [Laravel] Fix for error "Failed due to unhealthy allocations" - #6 by together