Anyone else having their apps go down?

wjordan · September 13, 2024, 8:56pm

Hi @Yaeger, I know you’ve also been in touch through email support but I wanted to follow up here on the issues you reported recovering your app after a single host in ams became unavailable.

First, it looks like you were running an app with one Machine and an attached Volume, which can only be recovered by restoring a daily snapshot to a new volume and re-deploying. This is why we strongly recommend against a single Volume when running any app storing important data, or that needs to be highly-available (there are warnings all over our CLI and dashboard). It’s only okay for a couple very limited use-cases:

In a few cases, you can run a single Machine with an attached volume. For example, if your app is in development and you’re not yet worried about downtime or if you’re running an app that can handle downtime and has a custom backup procedure.

When a host is unavailable, you should expect 408 timeout errors for any Machine/Volume API operations. However, to help cleanup these resources after recovery, you can force-destroy a Machine on an unavailable host with fly machine destroy --force - see Minimizing Impact of Dead Hosts: New Features and Recovery Techniques for more info about this new option.

The fly scale count error you encountered was a bug that caused scale count to fail on apps with unavailable Machines, and the opaque error message (“Oops, something went wrong!”) was not at all helpful for figuring out what was causing the failure. We’ve now published fixes for both of these issues - #3923 (fixing the regression in fly scale count) was published in v0.2.127 and #3850 (showing stack traces for flyctl errors) will be included in the next release.

Topic		Replies	Views
Issue Getting Data After Host Machine Crash Questions / Help	3	185	April 30, 2024
Something went wrong? Questions / Help	42	1433	September 22, 2022
App is now randomly "Not Deployed" - Why?	8	560	August 3, 2021
It's been 38hs and my instance is still experiencing an outage	8	490	October 4, 2023
Deploy consistently failing	2	68	September 16, 2024

Anyone else having their apps go down?

Related topics