Error failed to wait for VM in started state: failed to wait for machine to be ready

Whenever I run a flyctl deploy for a Machines app, 50% of the times I see:

deployment-01GD6HE5033WMFZKR5N14XYDTA: digest: sha256:99de07d25df38de98cfbbeaf68923a4c05deabb3267e786aceadce08e05dcfa6 size: 2822
--> Pushing image done
image size: 409 MB
Deploying with rolling strategy ✓
Error failed to wait for VM 73287176f52285 in started state: failed to wait for machine to be ready

But flyctl m list -a <app-name> shows that the VM 73287176f52285 has started and has the latest image (and responds to requests, too), but because the deployment strat is rolling (default for Machines I guess?), none of the other machines (of the same app) are deployed that image.

What could be going wrong here?

fly.toml, services section
internal_port = 8055
protocol = "tcp"

  hard_limit = 75
  soft_limit = 60
  type = "connections"

  handlers = ["tls"]
  tls_options = { alpn = ["h2", "http/1.1"] }
  port = 443

  handlers = ["tls"]
  tls_options = { alpn = ["h2", "http/1.1"] }
  port = 8055

  grace_period = "15s"
  interval = "30s"
  restart_limit = 3
  timeout = "2s"

Just a wild guess that flyctl deploy is waiting only until the default timeout of 60s?

immediate was lightening quick in comparison to rolling. Any reason why fly deploy to Machines is set to rolling instead of the default canary or even immediate?

fly deploy --config fly.machines.toml --remote-only --image<img> --strategy immediate
==> Verifying app config
--> Verified app config
==> Building image
Searching for image '<img>' remotely...
image found: img_deadb33fd3adbeef
Deploying with immediate strategy ✓

cc: @jsierles

A post was split to a new topic: Failed deploy, unhealthy allocs