We just noted a few apps randomly died because of redis connection errors to another redis app deployed on fly so they are dead. But nothing brings them back?
Another app that is now dead and wont restart was last seen logging:
Pulling image failed
This happen a few times with no prior errors and then died, won’t recover.
The only way I have found in the past to unbrick dead apps is doing a fly deploy - but with a complicated CI/CD setup, this can be a problem to get apps back up ASAP.
Were these apps + redis instances by chance? We are pretty good at rescheduling apps, but when redis needs to boot first it may not work properly. This is currently our biggest ongoing projects.
A fly secrets set is a simpler way to do the fly deploy process. fly restart just restarts VMs in place, so if there are non scheduled it won’t do anything.
There’s not much you can do about this right this second. It will improve, and last night’s outage was somewhat unique, so there’s a low percentage chance of the same project occurring again.