What’s this about?
Over the last year, we have been seeing more and more Postgres apps go down due to unstable connections with our multi-tenant Consul service. Stolon, the open-source solution that we have been using for HA management requires an always-stable connection with our Consul. The issue is when that connection becomes unstable, PG’s start becoming unaccessible. We have been pretty disappointed with how this has impacting our users and decided it was time to try a new approach.
This brings us to our next upcoming iteration of Postgres, which runs EDB’s repmgr
at its core. Repmgr is super lightweight and offers a suite of tools for managing replication and failovers. What it doesn’t offer, however, is any real strong opinions on how a cluster should be managed. This does offer some appeal, as it allow us to really dial-in on how we feel a Postgres cluster should be managed. This is still very much a work-in-progress, but we feel we are at a point where we’d like to start getting feedback.
The project can be found here: GitHub - fly-apps/postgres-flex: Postgres HA setup using repmgr
Major Changes
Consul usage has been significantly reduced
Active Postgres clusters will no longer see interruptions in the event of a Consul outage!
That being said, there are a few things that will still be impacted:
- Horizontal scaling
- Configuration updates made via
fly pg config update
This is still annoying, but a pretty significant improvement from a stability standpoint.
### PGBouncer is part of the base setup
PGBouncer has been a pretty common feature request and is now part of the the base topology. Configuration options are pretty limited at the moment, but these should be available soon!
Quorum requirements
Unlike our Stolon implementation, quorum must be met in order to achieve HA. Basically, you should plan on running at least 3 members if you want any sort of HA guarantees. We are looking to support 2 + 1 setups in the near future, which will allow you to run 2 standard members plus a lightweight “witness” member that’ll just be there to meet quorum requirements and protect against split-brain.
In the event that quorum cannot be met, the cluster will go readonly.
What Features are missing?
- The
fly pg failover
is not supported quite yet, but should be available soon.
2. We have not yet added support for Postgres extensions. E.G. PostGIS, TimescaleDB, etc.
Getting started
Make sure you are running the latest version of flyctl
or at least version v0.0.455
.
Specify the --flex
flag when provisioning your next Postgres app to test out the new implementation:
flyctl pg create --name <app-name> --flex
Warning: This should not be used in production quite yet.
Questions/Feedback
If you have any questions or feedback, please let us know! We’d love to hear from you!
Issues
If you encounter any issues, please let us know! You can reply in this thread, or submit an issue here: GitHub - fly-apps/postgres-flex: Postgres HA setup using repmgr