[We want your opinion] Health check alerts

Health checks are your way of knowing when something goes bad on your app. You can rely on default generated ones for Apps V2, our proxy can tell you when a few things go bad, our postgres comes with default checks so you can understand your cluster health and you can even create your own.

So far the UX we have around that is both showing something on your logs if it comes from the proxy or showing on your app Monitoring Page or using fly checks list -a APPNAME. But that only tells you what is happening now.

We’d love to hear your opinions on what you think you need to get the most out of our health checks so you can improve the reliability of your apps.

5 Likes

Don’t really need it for AppsV2 since VMs recover like clockwork (we see at least 3 OOMs among 30+ machines a day due to the nature of the service we run [0]), and I don’t see zombies or phantoms or ghost VMs anymore, which was a huge uptime problem before.

That said, a webhook / email (to a custom address) on health-check failures (or any VM down events; or better on ALL VMs down events) would be neat.

[0] The Fly proxy needs to expose a token bucket like addmission control (burst + fill rate), because a barrage of requests is usually what causes these OOMs.

4 Likes

+1 to email/webhook on failure.

3 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.