trying to deploy a simple web app, and it keeps failing with an obscure error message:
3 desired, 2 placed, 1 healthy, 1 unhealthy [health checks: 1 total, 1 passing]
--> v11 failed - Failed due to unhealthy allocations - rolling back to job version 10 and deploying as v12
watching flyctl status at the same time, i see something like this:
Description = Deployment is running
Instances = 3 desired, 2 placed, 1 healthy, 0 unhealthy
Instances
ID PROCESS VERSION REGION DESIRED STATUS HEALTH CHECKS RESTARTS CREATED
12e25bfc app 13 ⇡ lhr run pending 0 5m34s ago
e86671e2 app 13 ⇡ sin run running 1 total, 1 passing 0 5m56s ago
c39db057 app 12 lax run running 1 total, 1 passing 0 2022-10-06T22:39:47Z
the instance in lhr never even seems to begin running the health checks. there’s no reason this static web app would pass health checks in sin and fail in lhr.
it eventually succeeded! FWIW this is not my first experience with fly being flaky during deploys, but when i bring it up to support they always reassure me that it’s a configuration or a UI issue and that fundamentally deploys are fine. i’m not totally sure – i’m curious what’s in the works in fly’s deploy process, because i’m having a hard time trusting it.
You can usually tell if it’s us or you when you debug a failed deploy. You need to run fly status --all to find the failing instance ID, then fly vm status <id>. The vm status command will give you details about what actually failed.
When it’s our infrastructure failing, you’ll typically see weird errors about “failed to pull image”. The majority of deploy failures are not this, they’re usually health checks failing, an app crashing, or health checks taking too long to start passing.
I had the same issue yesterday and it keeps going on.
Here my logs
BInstance
ID PROCESS VERSION REGION DESIRED STATUS HEALTH CHECKS RESTARTS CREATED
db104ab0 app 55 cdg run running 1 total, 1 critical2 57s ago
Recent Events
TIMESTAMP TYPE MESSAGE
2022-10-26T06:50:05Z Received Task received by client
2022-10-26T06:50:05Z Task Setup Building Task Directory
2022-10-26T06:50:44Z Started Task started by client
2022-10-26T06:50:48Z Terminated Exit Code: 100
2022-10-26T06:50:48Z Restarting Task restarting in 1.128032948s
2022-10-26T06:50:55Z Started Task started by client
2022-10-26T06:50:59Z Terminated Exit Code: 100
2022-10-26T06:50:59Z Restarting Task restarting in 1.026185638s
2022-10-26T06:51:06Z Started Task started by client
2022-10-26T06:50:44Z [info]Starting virtual machine
2022-10-26T06:50:44Z [info]Starting init (commit: 249766e)…
2022-10-26T06:50:44Z [info]Preparing to run: /entrypoint as root
2022-10-26T06:50:44Z [info]2022/10/26 06:50:44 listening on [fdaa:0:b967:a7b:ae02:db10:4ab0:2]:22 (DNS: [fdaa::3]:53)
2022-10-26T06:50:46Z [info]s6-overlay-suexec: fatal: can only run as pid 1
2022-10-26T06:50:46Z [info]Starting clean up.
2022-10-26T06:50:53Z [info]Starting instance
2022-10-26T06:50:54Z [info]Configuring virtual machine
2022-10-26T06:50:54Z [info]Pulling container image
2022-10-26T06:50:54Z [info]Unpacking image
2022-10-26T06:50:54Z [info]Preparing kernel init
2022-10-26T06:50:55Z [info]Configuring firecracker
2022-10-26T06:50:55Z [info]Starting virtual machine
2022-10-26T06:50:55Z [info]Starting init (commit: 249766e)…
2022-10-26T06:50:55Z [info]Preparing to run: /entrypoint as root
2022-10-26T06:50:55Z [info]2022/10/26 06:50:55 listening on [fdaa:0:b967:a7b:ae02:db10:4ab0:2]:22 (DNS: [fdaa::3]:53)
2022-10-26T06:50:57Z [info]s6-overlay-suexec: fatal: can only run as pid 1
2022-10-26T06:50:57Z [info]Starting clean up.
2022-10-26T06:51:04Z [info]Starting instance
2022-10-26T06:51:04Z [info]Configuring virtual machine
2022-10-26T06:51:04Z [info]Pulling container image
2022-10-26T06:51:05Z [info]Unpacking image
2022-10-26T06:51:05Z [info]Preparing kernel init
2022-10-26T06:51:05Z [info]Configuring firecracker
2022-10-26T06:51:06Z [info]Starting virtual machine
2022-10-26T06:51:06Z [info]Starting init (commit: 249766e)…
2022-10-26T06:51:06Z [info]Preparing to run: /entrypoint as root
2022-10-26T06:51:06Z [info]2022/10/26 06:51:06 listening on [fdaa:0:b967:a7b:ae02:db10:4ab0:2]:22 (DNS: [fdaa::3]:53)
2022-10-26T06:51:07Z [info]s6-overlay-suexec: fatal: can only run as pid 1
2022-10-26T06:51:08Z [info]Starting clean up.
→ v55 failed - Failed due to unhealthy allocations - rolling back to job version 54 and deploying as v56
I got this s6-overlay-suexec: fatal: can only run as pid 1 error since yesterday and without any changes in fly.toml or Dockerfile.