Preview: health checks and alerting (deprecated)

Reading on this more @kurt
I had an issue where one of the instance for postgres went from leader to error.

And it appears that fly.io knows it did that.

May I suggest something?

From the product perspective the goal is to make sure nothing is failing.
Since we know there will be a case as above, instance will fail we have two options:
a) Since fly.io knows we went into error state, automatically enable replica.
b) Notify the owner that the instance just went from leader to error, and let him figure out how to fix.

A short term solution:
Since fly.io already knows the status why not pass the message.

A long term solution could be:
Hey your instance for PostgreSQL just went from leader to error. FYI. You should look into it or better yet setup automatic…(replica/clone) so no clients are affected.

I would imagine the long term solution would be a win-win. Not only did you notify owner of failed instance, owner doesn’t need to replicate your motioning system, and you are promoting deeper usage of your products forcing user to lets say switch from free mode to paid mode if maintaining 99.99% reliability is one of their goals.

Would this be a potential solution for you?

I’m not sure if I can help you yet, but would like to get on a call to walk through the criteria when you are making a decision like this?

So you don’t have to chase me or vice-versa.**
https://calendar.app.google/obKKg1UjgCCR77aw9

Thanks
Lucas