Launch: health checks and alerting (help us with pricing model)

We’re launching this next week, if you’re following along here you get to try it first (and give us pricing feedback!).

You can now install health checks in Fly apps, and configure them to alert you via PagerDuty or Slack when they fail.

This means you can configure your apps to wake you up at all hours of the night when things go wrong. And then fix them so you can sleep the next night. Like if you’re building a PostgreSQL cluster, and somehow the cluster loses its leader, you might be greeted with:

A little DBA elbow grease and you can satisfy PagerDuty:

Configuring health checks

Health checks can be scripts, TCP connections, or HTTP tests. The fly.toml in our example PostgreSQL app looks like this:

# stop accepting new connections while existing sessions complete
kill_signal = "SIGTERM"
# allow 5 minutes to cleanly shutdown
kill_timeout = 300

[checks.master-elected]
type = "script"
interval = 5000
command = "/fly/master_elected.sh"
restart_limit = 0

This calls a script named /fly/master_elected.sh every 5 seconds. If that script exits with a 0, no alerts. If it exits with 1 or greater, alerts! Here’s what this one actually does:

#! /bin/bash
set -e
export $(cat /data/.env | xargs)

status=$(stolonctl status)
mk=$((echo "$status" | grep "Master Keeper" | awk '{print $3}') || echo "")

if [ -z "$mk" ]; then
    echo "${status}"
    exit 2
fi

ip=$(echo "$status" | grep "^${mk}" | awk '{print $3}' | sed -e 's/:5432$//')

self=$(grep fly-local-6pn /etc/hosts | cut -f 1)

if [ "$ip" == "$self" ]; then
    echo "Master: self"
    exit 0
fi

echo "Master: $ip"

PagerDuty Handler

Configuring PagerDuty alerts is simple. Create a “Service” in PagerDuty, choose the “Sensu” integration type, and copying the key:

Then run flyctl handlers create --type pagerduty (you might need to flyctl version update first). It’ll prompt you for organization and your PagerDuty integration key.

Next, go break stuff.

Slack Handler

You can also spam your favorite Slack channel when alert status changes:

Just create an incoming Slack webhook, and run flyctl checks handlers create --type slack. It’ll prompt you for the webhook URL, channel, username, and user icon URL. @kyle made is a delightful default icon, though, so I don’t know why you’d ever want your own …

fly-chat-icon

Pricing for health checks

Do you all have a guess on how many checks you’d want to run per VM? We’re thinking about including three health checks per VM, and then charging $2/mo for up to 15. So if you configure 2 checks in your fly.toml, it’s free. If you configure 10 checks, we’d charge $2/mo/vm you have running. These would be prorated to the life of the VM.

5 Likes

This is AMAZING :heart_eyes:

1 Like

Could you support Discord webhook through slack compatibility?

Discord allow to use a webhook with Slack compatibility by adding /slack at the end of the URL.

It seems like it doesn’t work when adding this URL:

Error Invalid Slack webhook URL

That should already work! Let me do some testing, good catch.

@emiliendevos give it a try now? I just relaxed the URL validation.

It still doesn’t work unfortunately.

Should I update flyctl?

My URL looks like this one: https://discord.com/api/webhooks/799672577307443270/oZ2Sg1_evyxLcXXfwv1yjh1itd3xZ-L8a5dvTTGKyEufCjzbGcXPH2cIX0LOEzmBpepU/slack

Huh, I just used that exact URL and it worked fine. Will you try again in a few minutes? I wonder if you hit an old version of our API that hadn’t been drained yet.

Also that isn’t a real webhook URL is it? If it is we can edit it out of your post so no one else stumbles across it.

Don’t worry it’s a deleted webhook that I created just for the demo.

I can try again in 10 minutes.

1 Like

I do confirm that it’s working fine now after waiting 10 minutes.

1 Like

I don’t understand the pricing on this one :ok_man:t5: at all.

Ahhh! Don’t worry about it, the first pricing idea we had was garbage. We’ll come up with something else. :slight_smile:

2 Likes