Postgres on FRA is having issue?

After having issues with deployment on FRA, it seems now that my Postgres instance is also encountering issues:

2022-12-30T10:43:01.248 app[858b7334] fra [info] keeper | 2022-12-30 10:42:50.141 UTC [1262] STATEMENT: SELECT * FROM pg_stat_bgwriter;

2022-12-30T10:43:01.248 app[858b7334] fra [info] keeper | 2022-12-30 10:42:52.009 UTC [1086] LOG

2022-12-30T10:43:01.248 app[858b7334] fra [info] using stale statistics instead of current ones because stats collector is not responding

2022-12-30T10:45:31.619 app[858b7334] fra [info] keeper | 2022-12-30 10:45:07.063 UTC [1338] LOG: using stale statistics instead of current ones because stats collector is not responding

2022-12-30T10:47:30.042 app[858b7334] fra [info] sentinel | 2022-12-30T10:47:30.041Z ERROR cmd/sentinel.go:102 election loop error {"error": "Put \"\": dial tcp: lookup on [fdaa::3]:53: read udp [fdaa:0:7b2e:a7b:23c3:1:583b:2]:44016->[fdaa::3]:53: i/o timeout"}

2022-12-30T10:47:50.953 app[858b7334] fra [info] sentinel | panic: close of closed channel

2022-12-30T10:47:51.014 app[858b7334] fra [info] sentinel |

2022-12-30T10:47:52.942 app[858b7334] fra [info] keeper | 2022-12-30T10:47:52.931Z ERROR cmd/keeper.go:1041 error retrieving cluster data {"error": "Unexpected response code: 500"}

2022-12-30T10:48:07.083 app[858b7334] fra [info] sentinel | 2022-12-30T10:48:07.067Z ERROR cmd/sentinel.go:1843 error retrieving cluster data {"error": "Unexpected response code: 500"}

Is there something wrong with the FRA environment?


For the past 2 hours, we’ve also been facing issues trying to deploy new revisions for deployments in the FRA region. Our deployments fail with the following error message:

Failed due to unhealthy allocations - not rolling back to stable job version xx as current job has same specification.

It also seems like other people are facing similar issues, e.g. @juliusgoddafrd is not able to roll out new revisions either according to the post at Postgres commands failing - #8 by juliusgoddafrd.

1 Like

Also getting deployment failing due to “Failed due to unhealthy allocations” error in FRA. currently says there are issues in FRA, but the message was updated 2 days ago. Any update on when this could be cleared up?

1 Like

Update: the deploy went through eventually after ~1h or trying.

I’m having issues again :frowning:

@jerome sorry for the direct ping, but is there some issues with FRA. My Postgres in production went down for no reasons. I tried this following a comment somewhere else:

fly config save --app hXXXXXX-prod-db
fly deploy . --image flyio/postgres:14 --strategy=immediate

But it did not help. Monitoring logs is looping on:

2023-01-19T18:23:50.829 app[e386dd55] fra [info] sentinel | 2023-01-19T18:23:50.829Z WARN cmd/sentinel.go:276 no keeper info available {"db": "6a889cd3", "keeper": "23c313bbd2"}

Keeper 23c313bbd2 is not healthy (whatever that means :frowning:)

fly ssh console
Update available 0.0.443 -> 0.0.450.
Run "fly version update" to upgrade.
Connecting to fdaa:0:7502:a7b:23c4:1:3bbe:2... complete
root@e386dd55:/# stolonctl status
=== Active sentinels ===

122fbc33	false
3d803733	false
740234c7	false
780612d7	true

=== Active proxies ===

No active proxies

=== Keepers ===

23c313bbd2	false	(unknown)		false		1			0
23c413bbe2	true	fdaa:0:7502:a7b:23c4:1:3bbe:2:5433	true	4	4

=== Cluster Info ===

Master Keeper: 23c413bbe2

===== Keepers/DB tree =====

23c413bbe2 (master)

$ fly pg restart -a hXXXXX-prod-db
Error can't get role for 984e86c6-ad65-8706-a236-14e8754a5203: Get "http://:5500/commands/admin/role": dial: lookup : no such host


There does seem to be a current issue in FRA. That would likely explain some of these. Databases will have a volume, so moving those is presumably more complicated than only-vm apps. Keep an eye on:

1 Like