Manipulate volume while instance is crashing

I have an instance that’s crashing because it needs some manual fixes applied to an attached volume. How can I fix the volume if the app just keeps crashing at startup? I need access to the shell for the instance, which I would normally get with fly ssh console.

1 Like

Is this one instance of a running production app with many instances? One way to do this would be to deploy temporarily with code that doesn’t try to load anything from the volume, so your service is able to run and you can login.

If this is not possible, you could stop that instance and mount the volume to a different VM for debugging.

With a bit more detail about the app, such as the app name, I can help you through that process.

I think knowing some good strategies to keep the instance alive while I fix things up would help. The app name is dht-indexer.

as Joshua described, you can stop a particular instance with fly vm stop $allocationID

We also recently added the ability to extend existing volumes, with fly volumes extend – this can come in handy if your volumes have run out of space.

If you need to actually dig into what you have in the volume, my first reaction would be something like:

  1. comment out services and any healthchecks (basically anything except your app name) in your fly.toml (something like below)
  2. Write a (or comment your existing) Dockerfile that’s something like below
  3. fly deploy
  4. fly ssh console
# fly.toml
app = "<yourapp>"
FROM ubuntu

CMD tail -f /dev/null

This should deploy basically an empty ubuntu VM to Fly then you can do whatever you need to fix up your app. If you already have a Dockerfile you could just change your CMD to something like above and SSH in - whatever prevents your app from crashing the VM.

Extend is useful, thanks. I don’t think stopping the VM helps: I need it running to fix the volume. I just created another “rescue” app, thinking I could mount the volume from that, but discovered the volumes are locked to the app they are created for. If I deploy a rescue image, that will make all my instances for the app unavailable?

Yes, my suggestion above is somewhat of a nuclear option; taking the app offline in order to fix your volume.

I didn’t realize the app was partially working with your existing images and you wanted to keep those running, makes sense!

Without knowing too much about your app, I would expect errors related to this data access could be handled/caught/rescued in such a way that your actual web-server doesn’t go down and trigger restarts - especially while you get this volume fixed up. Maybe instead of dying it can log and respond with an error message or something.

As long as the VM instance is up and not deemed unhealthy (and therefor taken offline), you should be able to SSH into it to do any manual work you’d need to do.