Release revert/health check detailed logs?

Hello, recently one of our deploys got rolled automatically rolled back by Fly and I am having a hard time investigating.

I have several questions, so sorry if this should have been split into more than one post:

  1. How can I see the details of why a particular revert was executed? All I can infer is from the services.tcp_checks in the toml file, but I have no logs. From our logs at that time I only see the two original vms, and no evidence two new ones were ever created at all. Historical logs from the Monitoring page would be super useful here.
  2. The deploy version numbers seem to be off by one? I deployed again this morning semi-successfully, and it says the deploy failed, but wasn’t rolled back, and the deploy number doesn’t seem to match up with the cli output (241 vs 240).
  3. I’m now in a “Failed deployment” state, even though my app seems to be working fine. What is the state in which Fly considers 1 instance healthy and 1 unhealthy, but also that 2 instances are passing health checks? I’m not sure what to make of it.

Sorry for the text dump above, I’m just trying to get straight what happened. Let me know if I can clarify at all.

1 Like

Try flyctl releases -a myapp --image. This is what I see for one of my apps (v1):

flyctl releases --image -a myapp
VERSION	STABLE	TYPE    	STATUS   	DESCRIPTION                  	USER          	DATE                	DOCKER IMAGE                                                                                 
v407   	true  	release 	succeeded	Secrets updated              	e@mail.tld	2023-01-20T15:10:02Z
v406   	true  	release 	succeeded	Deploy image                 	e@mail.tld	2023-01-07T17:19:47Z
v404   	true  	scale   	succeeded	Scale VM count: ["app, 3"]   	e@mail.tld	2023-01-04T18:30:55Z	
v403   	true  	rollback	failed   	Reverting to version 401       	e@mail.tld  2023-01-04T18:23:05Z
v402   	false 	release 	failed   	Deploy image                 	e@mail.tld  2023-01-04T18:22:04Z
v401   	true  	release 	succeeded	Deploy image                 	e@mail.tld  2023-01-04T17:38:26Z

There’s a flyctl history -a myapp command too, which may or may not be as useful…

App logs are kept around by Fly on a best-effort basis and so, if you’re looking to persist logs, see: Fly Logs over NATS

From what you’ve written here, I wouldn’t be surprised if it is a bug with the Fly control plane.

To unblock from such situations in the past, a scale up and back down again (ex: flyctl scale count <6> -a myapp => flyctl scale count <3> -a myapp) has worked just as nicely.

Consider sharing the affected app name. It may help Fly engs lurking around to take a look to see what actually went on.

If your spend is more than $29/mo, consider subscribing to the Launch plan, then email support: Support: community vs email (Read this first) - #9 by eli

Thanks for your help, we’re spending way more than that so I’ll check out getting our plan changed.

1 Like