App Shutting Down and won't restart?

I have my blog deployed on fly and this morning when I tried to show it to a friend I saw that it was down. When I went to investigate it looks like it got shutdown earlier today and then failed to immediately restart due to the volume being used. But it still hasn’t retried several hours later?

I also tried running a fly restart and fly scale count 0 && fly scale count 1, but it still shows that it’s pending the placement of 1 node?

Hmm. I’m not sure about the subsequent stuck-as-pending issue but as for the question about why it hasn’t been retried, it may be because your fly.toml does not specify to (which is currently an opt-in):

  • restart_limit: The number of consecutive HTTP check failures to allow before attempting to restart the VM. The default is 0, which disables restarts based on failed HTTP health checks.

See:

So I’d recommend adding a healthcheck if you don’t already have one, and set that restart_limit as e.g 1. And that way the app will be restarted upon a failure.

Someone from Fly will need to investigate the volume if that continues. Unless it actually was being written to at the time. Not sure.

My fly.toml is public on github and specifies a restart limit of 5 attempts blog/fly.toml at master · mmmries/blog · GitHub

It looks to me like fly is expecting it to be running because the status shows “pending” with an scale count of 1 set?

I just checked again and now my app is running with a single instance. I didn’t change anything or run any scale commands, it just finally started 5 hours after my last scale command???

This is likely due to a capacity issue on a host in London. We’re working on it!

shared-cpu-1x VMs with a volume attached are the most likely to fail due to capacity issues since we can’t (yet) easily move volumes around. When hosts get overloaded the free tier VMs are the first to get shut down.

If you can run two instances, you shouldn’t notice when this happens. If you can only run one instance (because you’re using sqlite or something), you may have to bear with us for a few days while we get things moved around.

1 Like