Postgres on FRA is having issue?

binajmen · December 30, 2022, 10:54am

After having issues with deployment on FRA, it seems now that my Postgres instance is also encountering issues:


2022-12-30T10:43:01.248 app[858b7334] fra [info] keeper | 2022-12-30 10:42:50.141 UTC [1262] STATEMENT: SELECT * FROM pg_stat_bgwriter;

2022-12-30T10:43:01.248 app[858b7334] fra [info] keeper | 2022-12-30 10:42:52.009 UTC [1086] LOG

2022-12-30T10:43:01.248 app[858b7334] fra [info] using stale statistics instead of current ones because stats collector is not responding

2022-12-30T10:45:31.619 app[858b7334] fra [info] keeper | 2022-12-30 10:45:07.063 UTC [1338] LOG: using stale statistics instead of current ones because stats collector is not responding

2022-12-30T10:47:30.042 app[858b7334] fra [info] sentinel | 2022-12-30T10:47:30.041Z ERROR cmd/sentinel.go:102 election loop error {"error": "Put \"https://consul-fra.fly-shared.net/v1/session/create?wait=5000ms\": dial tcp: lookup consul-fra.fly-shared.net on [fdaa::3]:53: read udp [fdaa:0:7b2e:a7b:23c3:1:583b:2]:44016->[fdaa::3]:53: i/o timeout"}

2022-12-30T10:47:50.953 app[858b7334] fra [info] sentinel | panic: close of closed channel

2022-12-30T10:47:51.014 app[858b7334] fra [info] sentinel |

2022-12-30T10:47:52.942 app[858b7334] fra [info] keeper | 2022-12-30T10:47:52.931Z ERROR cmd/keeper.go:1041 error retrieving cluster data {"error": "Unexpected response code: 500"}

2022-12-30T10:48:07.083 app[858b7334] fra [info] sentinel | 2022-12-30T10:48:07.067Z ERROR cmd/sentinel.go:1843 error retrieving cluster data {"error": "Unexpected response code: 500"}

Is there something wrong with the FRA environment?

nicoh · December 30, 2022, 11:11am

For the past 2 hours, we’ve also been facing issues trying to deploy new revisions for deployments in the FRA region. Our deployments fail with the following error message:

Failed due to unhealthy allocations - not rolling back to stable job version xx as current job has same specification.

It also seems like other people are facing similar issues, e.g. @juliusgoddafrd is not able to roll out new revisions either according to the post at Postgres commands failing - #8 by juliusgoddafrd.

Nik · December 30, 2022, 12:55pm

Also getting deployment failing due to “Failed due to unhealthy allocations” error in FRA.

status.fly.io currently says there are issues in FRA, but the message was updated 2 days ago. Any update on when this could be cleared up?

Nik · December 30, 2022, 1:16pm

Update: the deploy went through eventually after ~1h or trying.

binajmen · January 19, 2023, 6:24pm

I’m having issues again

@jerome sorry for the direct ping, but is there some issues with FRA. My Postgres in production went down for no reasons. I tried this following a comment somewhere else:

fly config save --app hXXXXXX-prod-db
fly deploy . --image flyio/postgres:14 --strategy=immediate

But it did not help. Monitoring logs is looping on:

2023-01-19T18:23:50.829 app[e386dd55] fra [info] sentinel | 2023-01-19T18:23:50.829Z WARN cmd/sentinel.go:276 no keeper info available {"db": "6a889cd3", "keeper": "23c313bbd2"}

binajmen · January 19, 2023, 6:36pm

Keeper 23c313bbd2 is not healthy (whatever that means )

fly ssh console
Update available 0.0.443 -> 0.0.450.
Run "fly version update" to upgrade.
Connecting to fdaa:0:7502:a7b:23c4:1:3bbe:2... complete
root@e386dd55:/# stolonctl status
=== Active sentinels ===

ID		LEADER
122fbc33	false
3d803733	false
740234c7	false
780612d7	true

=== Active proxies ===

No active proxies

=== Keepers ===

UID		HEALTHY	PG LISTENADDRESS	PG HEALTHY	PG WANTEDGENERATION	PG CURRENTGENERATION
23c313bbd2	false	(unknown)		false		1			0
23c413bbe2	true	fdaa:0:7502:a7b:23c4:1:3bbe:2:5433	true	4	4

=== Cluster Info ===

Master Keeper: 23c413bbe2

===== Keepers/DB tree =====

23c413bbe2 (master)
└─23c313bbd2

root@e386dd55:/#

binajmen · January 19, 2023, 6:42pm

$ fly pg restart -a hXXXXX-prod-db
Error can't get role for 984e86c6-ad65-8706-a236-14e8754a5203: Get "http://:5500/commands/admin/role": dial: lookup : no such host

greg · January 19, 2023, 7:21pm

Hi,

There does seem to be a current issue in FRA. That would likely explain some of these. Databases will have a volume, so moving those is presumably more complicated than only-vm apps. Keep an eye on:

Topic		Replies	Views
PG issues `backend 'bk_db' has no server available!`, FRA region	5	625	October 24, 2022
Possible issue with database	27	3359	March 2, 2022
Postgres app stuck on the "un-mounting volume" stage postgres	6	257	October 27, 2022
Health check for your postgres database has failed	1	601	April 6, 2023
Failing Postgres DB connection from app here and there Questions / Help	45	2269	April 26, 2023

Postgres on FRA is having issue?

Related topics