Problem is still here.
See:
We discovered an issue with Stolon that was preventing failed keepers from getting cleaned up as expected. This issue has been patched and addressed with release v0.0.29.
You can upgrade to the latest release via:
fly image update --app <app-name>
Also, as a side note:
With this upgrade, you should also no longer need to export any additional environment variables in order to leverage stolonctl commands.
If you have any questions on this, just let us know!
cc:// @iangcarroll @LeoAdamek
Also:
Postgres instance is throwing the following error
2022-11-30T17:58:23.655 app[eb7f1549] maa [info] sentinel | 2022-11-30T17:58:23.654Z WARN cmd/sentinel.go:276 no keeper info available {"db": "ed59a1bd", "keeper": "f0f183782"}
2022-11-30T17:58:30.150 app[eb7f1549] maa [info] sentinel | 2022-11-30T17:58:30.149Z WARN cmd/sentinel.go:276 no keeper info available {"db": "ed59a1bd", "keeper": "f0f183782"}
2022-11-30T17:58:30.150 app[eb7f1549] maa [info] sentinel | 2022-11-30T17:58:30.149Z WARN cmd…
# stolonctl status
=== Active sentinels ===
ID LEADER
1c82763a false
8c0e2b77 false
ab5dd211 true
d1d485eb false
=== Active proxies ===
No active proxies
=== Keepers ===
UID HEALTHY PG LISTENADDRESS PG HEALTHY PG WANTEDGENERATION PG CURRENTGENERATION
5adc4818686f2 true IPV6 true 2 2
5b66fe867b012 true IPV6 true 5 5
97421053fa2 true IPV6 true 2 2
f0fe8c8422b2 true IPV6 true 2 2
=== Cluster Info ===
Master Keeper: 5b66fe867b012
===== Keepers/DB tree =====
5b66fe867b012 (master)
├─f0fe8c8422b2
├─97421053fa2
└─5adc4818686f2
Logs:
2023-01-03T14:26:54.632 app[918536ef435dd8] cdg [info] sentinel | 2023-01-03T14:26:54.632Z WARN cmd/sentinel.go:276 no keeper info available {"db": "46de1b4e", "keeper": "f0fe8c8422b2"}
2023-01-03T14:27:47.120 app[918536ef435dd8] cdg [info] sentinel | 2023-01-03T14:27:47.119Z WARN cmd/sentinel.go:276 no keeper info available {"db": "a133484a", "keeper": "97421053fa2"}
2023-01-03T14:28:39.508 app[918536ef435dd8] cdg [info] sentinel | 2023-01-03T14:28:39.508Z WARN cmd/sentinel.go:276 no keeper info available {"db": "46de1b4e", "keeper": "f0fe8c8422b2"}
2023-01-03T14:29:47.097 app[918536ef435dd8] cdg [info] sentinel | 2023-01-03T14:29:47.097Z WARN cmd/sentinel.go:276 no keeper info available {"db": "a133484a", "keeper": "97421053fa2"}
2023-01-03T14:30:07.845 app[918536ef435dd8] cdg [info] sentinel | 2023-01-03T14:30:07.844Z WARN cmd/sentinel.go:276 no keeper info available {"db": "46de1b4e", "keeper": "f0fe8c8422b2"}
shaun
January 3, 2023, 3:04pm
2
Berndinox:
918536ef435dd8
These should clear after an hour or so. If this is not the case, let us know.
opened 04:51PM - 20 Sep 21 UTC
closed 07:21PM - 03 Oct 22 UTC
bug
Stolon will hold onto dead keepers for up to 48 hours, by default. Each standby… keeper has an associated replication slot that holds information about the standby's current WAL position. The replication slot basically tells Postgres not to recycle or remove any WAL files that could ultimately prevent it from catching up with the leader. That being said, we should aim to remove dead keepers on a shorter interval to help prevent needless retention of WAL files.
There are other configuration tweaks that we can make to help with this as well, but i'll address this in a separate issue.
Hej @shaun - thanks for the fast repsonse!
then I exercise a little patience
@shaun - unfortunately bad news
2023-01-03T22:26:17.628 app[918536ef435dd8] cdg [info] sentinel | 2023-01-03T22:26:17.628Z WARN cmd/sentinel.go:276 no keeper info available {"db": "41d7b1a2", "keeper": "f0fe8c8422b2"}
2023-01-03T22:26:28.037 app[918536ef435dd8] cdg [info] sentinel | 2023-01-03T22:26:28.037Z WARN cmd/sentinel.go:276 no keeper info available {"db": "d740d71c", "keeper": "97421053fa2"}
still got the errors
@Berndinox @shaun Can it make Postgres down? I’m facing the same log, looping on the same message, and the DB is unreachable. I can ssh it, launch a proxy, but can’t connect, neither my Node app.
After a day the message was gone, database was reachable… maybe you Face another issue… idk
This error message just showed up in one of my apps.
The other app was having DB collation mismatch
(ref: https://community.fly.io/t/postgres-flex-database-postgres-has-a-collation-version-mismatch/14391 ).
But running fly image update -app <my_app_name>
this issue went away.