Hello, recently one of our deploys got rolled automatically rolled back by Fly and I am having a hard time investigating.
I have several questions, so sorry if this should have been split into more than one post:
- How can I see the details of why a particular revert was executed? All I can infer is from the services.tcp_checks in the toml file, but I have no logs. From our logs at that time I only see the two original vms, and no evidence two new ones were ever created at all. Historical logs from the Monitoring page would be super useful here.
- The deploy version numbers seem to be off by one? I deployed again this morning semi-successfully, and it says the deploy failed, but wasn’t rolled back, and the deploy number doesn’t seem to match up with the cli output (241 vs 240).
- I’m now in a “Failed deployment” state, even though my app seems to be working fine. What is the state in which Fly considers 1 instance healthy and 1 unhealthy, but also that 2 instances are passing health checks? I’m not sure what to make of it.
Sorry for the text dump above, I’m just trying to get straight what happened. Let me know if I can clarify at all.
flyctl releases -a myapp --image. This is what I see for one of my apps (v1):
flyctl releases --image -a myapp
VERSION STABLE TYPE STATUS DESCRIPTION USER DATE DOCKER IMAGE
v407 true release succeeded Secrets updated firstname.lastname@example.org 2023-01-20T15:10:02Z registry.fly.io/myapp@sha256:e026a18
v406 true release succeeded Deploy image email@example.com 2023-01-07T17:19:47Z registry.fly.io/myapp@sha256:e026a18
v404 true scale succeeded Scale VM count: ["app, 3"] firstname.lastname@example.org 2023-01-04T18:30:55Z registry.fly.io/myapp@sha256:e433b12
v403 true rollback failed Reverting to version 401 email@example.com 2023-01-04T18:23:05Z registry.fly.io/myapp@sha256:e433b12
v402 false release failed Deploy image firstname.lastname@example.org 2023-01-04T18:22:04Z registry.fly.io/myapp@sha256:4f1e04f
v401 true release succeeded Deploy image email@example.com 2023-01-04T17:38:26Z registry.fly.io/myapp@sha256:e433b12
flyctl history -a myapp command too, which may or may not be as useful…
App logs are kept around by Fly on a best-effort basis and so, if you’re looking to persist logs, see: Fly Logs over NATS
From what you’ve written here, I wouldn’t be surprised if it is a bug with the Fly control plane.
To unblock from such situations in the past, a scale up and back down again (ex:
flyctl scale count <6> -a myapp =>
flyctl scale count <3> -a myapp) has worked just as nicely.
Consider sharing the affected app name. It may help Fly engs lurking around to take a look to see what actually went on.
If your spend is more than $29/mo, consider subscribing to the Launch plan, then email support: Fly.io Support: community vs email (Read this first) - #9 by eli
Thanks for your help, we’re spending way more than that so I’ll check out getting our plan changed.