Preview: health checks and alerting (deprecated)

lucasmanual · October 18, 2023, 8:01pm

Reading on this more @kurt
I had an issue where one of the instance for postgres went from leader to error.

And it appears that fly.io knows it did that.

May I suggest something?

From the product perspective the goal is to make sure nothing is failing.
Since we know there will be a case as above, instance will fail we have two options:
a) Since fly.io knows we went into error state, automatically enable replica.
b) Notify the owner that the instance just went from leader to error, and let him figure out how to fix.

A short term solution:
Since fly.io already knows the status why not pass the message.

A long term solution could be:
Hey your instance for PostgreSQL just went from leader to error. FYI. You should look into it or better yet setup automatic…(replica/clone) so no clients are affected.

I would imagine the long term solution would be a win-win. Not only did you notify owner of failed instance, owner doesn’t need to replicate your motioning system, and you are promoting deeper usage of your products forcing user to lets say switch from free mode to paid mode if maintaining 99.99% reliability is one of their goals.

Would this be a potential solution for you?

I’m not sure if I can help you yet, but would like to get on a call to walk through the criteria when you are making a decision like this?

So you don’t have to chase me or vice-versa.**
https://calendar.app.google/obKKg1UjgCCR77aw9

Thanks
Lucas

Topic		Replies	Views
PosgreSQL on Fly: 1 critical health check	10	552	December 20, 2021
Postgres health checks perpetually failing Questions / Help postgres	3	675	March 2, 2023
Non-service health checks	4	1279	October 26, 2022
Health check for your postgres database is warning. Your database might be malfunctioning.	1	274	July 1, 2023
postgres app only running for an hour Build debugging postgres	2	330	May 13, 2023

Preview: health checks and alerting (deprecated)

Related Topics