Hey! First-time poster here. I’ve got an app on Fly.io. Working perfectly so far.
However, all of my new deployed versions are queuing up; the original version won’t stop/shutdown.
I’ve attempted to run
flyctl vm stop to no avail (still says the “desired state” is stopped and the “current state” is running). I’ve also tried to scale to 0, and while it says the status is “dead”, the VM is still running.
I’ve confirmed in my fly.toml that my
kill_timeout is indeed set to
5 (so it should have been killed after 5 seconds of no response) but that’s not happening.
Is there anything else I can try to stop the VM? I currently have 5 new versions queued up for deployment and the stubborn version is still sitting there and running. There may be open Websocket connections but I’m fine with those being killed, it just won’t do that.
Deleted the app and recreated and it seemed to work… not sure why this needed to be done however
I think that might have been a coincidence. We fixed a bug this morning (like 20 minutes ago) that was preventing some VMs from stopping. Sorry about that, not a great experience.
Yikes. Details, if share-worthy? Esp since VMs not quitting has billing ramifications for
I’m hitting this issue now on my climate-coolers instance. It won’t shut down and deploy a new version with updated secrets
We dug into this and were able to resolve an issue on the hosts where your VMs were hanging. On our end it looks like your app is working normally now, so hopefully that’s what you’re also seeing! Thank you for letting us know!
I’m not sure why but it’s happening once again. I’ve got 4 hanging instances
oof, sorry to hear that! is this the same app as before (climate-coolers)?
Exactly. Maybe I’m doing something wrong but I’m not really sure
I’m not sure yet either! I’d imagine you might have already done this, but you can view all instances that have been placed in the past 7 days with
fly status --all, and per-instance logs with
fly vm status <alloc-id> or
fly log -i <alloc-id>
I think we’ve got everything back up for you now.
It looks like this might have been on our end again-- we were able to narrow this issue down to a specific host and think that we’ve applied a more permanent fix!
Thanks Eli, I appreciate the transparency!