Many users leveraging the PG Flex implementation have been encountering issues with the manual failover process and various internal repmgr commands. The reason for these issues is the expiration of the underlying SSH certificates required for these operations…
To see if your certificates have expired, you can run the following command within your Machine:
Valid: from 2024-05-29T15:18:03 to 2124-05-30T16:18:03
To help you address this issue, we have introduced a new fly pg renew-certs command that will renew these certificates for you.
$ fly pg renew-certs --help
Usage:
fly postgres renew-certs [flags]
Flags:
-a, --app string Application name
-c, --config string Path to application configuration file
-h, --help help for renew-certs
--valid-days int The number of days the certificate should be valid for. (default 36525)
Once the certificates have been renewed, you will need to issue a new App deploy to apply the new certificates to your existing Machines. If you have never performed a deploy for your Postgres App, no problem, instructions will be provided after the certificate renewal command has been processed.
Thanks so much for this, it seems to help with long-standing failover problems that we’ve been having. Once we get new 100 year certs should we expect this problem to be solved for good? It’s unclear to me why this became a problem in the first place - were certs previously issued for shorter periods of time, were they supposed to auto-renew and didn’t, or were we supposed to renew them and missed some documentation?
@shaun Thank you! I second @elliotdickison’s question – it looks like the cert on our db nodes was valid for 1 day + 1 hour. I don’t remember specifying anything related to certs, and I would never have set it so low, so my guess this was the default at the time. Has the default been changed, or is there a way to specify how long the certs will be valid when we create new postgres apps?
Valid: from 2024-03-22T04:43:13 to 2024-03-23T05:43:13
Were certs previously issued for shorter periods of time, were they supposed to auto-renew and didn’t, or were we supposed to renew them and missed some documentation?
Honestly, this seems like it was an oversight on our end. It was a little hard to believe at first given how long the feature has been out, but I don’t have any evidence that indicates otherwise.
Has the default been changed, or is there a way to specify how long the certs will be valid when we create new postgres apps?
The default has been changed to 100 years, if you need something different you can use the command mentioned above and manually specify --valid-days.
@shaun Ok thanks for clarifying, just wanted to make sure we weren’t the only ones missing something. On the bright side we’ve learned how to do emergency maintenance with repmgr directly . Thanks very much for the fix, we’re looking forward to the smoother experience.