Deployment issue: `release_command` timing out

I’m using apps-v2 and here’s my fly.toml:

kill_signal = "SIGINT"
kill_timeout = 5
primary_region = "sea"
processes = []

[metrics]
port = 9090
path = "/metrics"

[env]
  PORT = "9090"

[deploy]
    release_command = "sh /root/migrate.sh"

[http_service]
  auto_start_machines = true
  auto_stop_machines = true
  internal_port = 9090
  force_https = true
  [http_service.concurrency]
    type = "connections"
    soft_limit = 800
    hard_limit = 1000

[checks]
  [checks.httpcheck]
    grace_period = "30s"
    interval = "15s"
    method = "get"
    path = "/metrics"
    port = 9090
    timeout = "10s"
    type = "http"
  [checks.tcpcheck]
    grace_period = "30s"
    interval = "15s"
    port = 9090
    timeout = "10s"
    type = "tcp"

And here’s the error:

Running [app_name] release_command: sh /root/migrate.sh
  Updating release_command machine 3d8d31ebe62d89
  Waiting for 3d8d31ebe62d89 to have state: stopped
Error: release command failed - aborting deployment. error running release_command machine: timeout reached waiting for machine to stopped failed to wait for VM 3d8d31ebe62d89 in stopped state: Get "https://api.machines.dev/v1/apps/[appname]/machines/3d8d31ebe62d89/wait?instance_id=[instance_id]&state=stopped&timeout=60": net/http: request canceled

Everything was going fine since I migrated, from my v1/v2 app, made a few releases and everything was OK. I noticed you guys had an issue with deployments today:

However, right now release_command is hanging, even when I change it to sh -c true in my release_command:

[deploy]
    release_command = "sh -c true"

[the same error mentioned above happens, but shows Running release_command: sh -c true]

I can deploy if I remove the whole deploy block from my fly.toml, so I’m not sure if release_command is supported in v2. (I believe it is, as I was using it earlier today).

Or maybe there’s still an issue from the incident that happened today?

Thanks

Release commands are expected to exit after a certain amount of time. This error means the VM didn’t finish, it’s still running.

In general, you can run fly logs -i 3d8d31ebe62d89 to see what’s up with a release command VM. You probably don’t want these VMs running forever, you should destroy any that are wedged.

That sh -c true command will run forever, so the deploy will time out waiting for it.


Wait I’m an idiot, sh -c true should exit immediately. Do you have the machine ID for that one?

1 Like

I thought 3d8d31ebe62d89 was the ID?

ID              NAME
3d8d31ebe62d89 crimson-water-7315

Do you want the instance_id? (I redacted it because I’m still not sure of what is ok adding here, but I think instance_id should be ok):

https://api.machines.dev/v1/apps/[app_name]/machines/3d8d31ebe62d89/wait?instance_id=01GYG97G5JN95JE6B9EKS4PNJY&state=stopped&timeout=60"

3d8d31ebe62d89 appears to be starting a server:

{"level":"info","time":"2023-04-20T21:31:02.015941625Z","message":"Listening on port :9090"}

I haven’t found one with the sh command yet. I found an earlier one that looked like that, then exited after an hour:

{"level":"fatal","error":"http: Server closed","time":"2023-04-20T21:25:38.46470001Z","message":"could not run"}

3d8d31ebe62d89 is the Machine ID that runs the release task. It’s like a normal machine, it just has no network services, doesn’t appear in DNS, and destroys itself on exit.

Hmmm, but doesn’t release_command run by itself? It seems to be also running either CMD or ENTRYPOINT

My /root/migrate.sh looks like this:

migrate -path /root/migrations -database "$MIGRATE_DB_URL" -verbose up

I’ll trigger another build locally and follow the new machine log.

release_command overrides CMD. It will still run ENTRYPOINT.

2 Likes

Ah. That makes sense why it was working earlier today.

I had to overwrite a ENTRYPOINT from a new base image and I didn’t know release_command would also run ENTRYPOINT. I thought my issue could be that so I tried changing my Dockerfile to use ENTRYPOINT [], but I didn’t notice my release_command machine was always the same, and running the whole time. Now everything makes sense.

Thanks for the tip on checking the release machine logs, didn’t know about it.

I believe I can open a PR to App Configuration (fly.toml) · Fly Docs to update the release_command documentation? It doesn’t mention that ENTRYPOINT will always run, no matter what. Not sure if it makes sense though, as it mentions that you can use RELEASE_COMMAND=1 to control the behavior of your ENTRYPOINT.

Thank you @kurt

4 Likes

Thanks for surfacing this, @brenol, and for taking the trouble to make a PR for us. Our config doc is now updated with this info.

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.