If you’re reading this now, you should know that many of these features never made it out of preview. Specifically, the Slack and PagerDuty handlers have been removed.
We built this on top of Sensu. Sensu was the wrong choice for a highly multitenant health checking system. We’d like to finish this feature someday, but for now it’s not something we are able to support.
We’re launching this next week, if you’re following along here you get to try it first (and give us pricing feedback!).
You can now install health checks in Fly apps, and configure them to alert you via PagerDuty or Slack when they fail.
This means you can configure your apps to wake you up at all hours of the night when things go wrong. And then fix them so you can sleep the next night. Like if you’re building a PostgreSQL cluster, and somehow the cluster loses its leader, you might be greeted with:
A little DBA elbow grease and you can satisfy PagerDuty:
Configuring health checks
Health checks can be scripts, TCP connections, or HTTP tests. The fly.toml
in our example PostgreSQL app looks like this:
# stop accepting new connections while existing sessions complete
kill_signal = "SIGTERM"
# allow 5 minutes to cleanly shutdown
kill_timeout = 300
[checks.master-elected]
type = "script"
interval = 5000
command = "/fly/master_elected.sh"
restart_limit = 0
This calls a script named /fly/master_elected.sh
every 5 seconds. If that script exits with a 0, no alerts. If it exits with 1 or greater, alerts! Here’s what this one actually does:
#! /bin/bash
set -e
export $(cat /data/.env | xargs)
status=$(stolonctl status)
mk=$((echo "$status" | grep "Master Keeper" | awk '{print $3}') || echo "")
if [ -z "$mk" ]; then
echo "${status}"
exit 2
fi
ip=$(echo "$status" | grep "^${mk}" | awk '{print $3}' | sed -e 's/:5432$//')
self=$(grep fly-local-6pn /etc/hosts | cut -f 1)
if [ "$ip" == "$self" ]; then
echo "Master: self"
exit 0
fi
echo "Master: $ip"
PagerDuty Handler
Configuring PagerDuty alerts is simple. Create a “Service” in PagerDuty, choose the “Sensu” integration type, and copying the key:
Then run flyctl handlers create --type pagerduty
(you might need to flyctl version update
first). It’ll prompt you for organization and your PagerDuty integration key.
Next, go break stuff.
Slack Handler
You can also spam your favorite Slack channel when alert status changes:
Just create an incoming Slack webhook, and run flyctl checks handlers create --type slack
. It’ll prompt you for the webhook URL, channel, username, and user icon URL. @hkfoster made is a delightful default icon, though, so I don’t know why you’d ever want your own …
Pricing for health checks
Do you all have a guess on how many checks you’d want to run per VM? We’re thinking about including three health checks per VM, and then charging $2/mo for up to 15. So if you configure 2 checks in your fly.toml
, it’s free. If you configure 10 checks, we’d charge $2/mo/vm you have running. These would be prorated to the life of the VM.