To persist logs one needs to setup fly-log-shipper
.
What is restart_limit
set to in your app’s health check (services.tcp_checks
) fly.toml section? If health checks fail or the app OOM
s, Fly’s control plane should ideally attempt to auto-restart the app restart_limit
many number of times (afaik).
That said, in the past when apps have gone down without warning to never come back up, it has been due to VM (and volume) migrations slipping through the cracks when decommissioning lemon hosts.
One solution is to run at least 2 instances, possibly in different regions.