After having issues with deployment on FRA, it seems now that my Postgres instance is also encountering issues:
2022-12-30T10:43:01.248 app[858b7334] fra [info] keeper | 2022-12-30 10:42:50.141 UTC [1262] STATEMENT: SELECT * FROM pg_stat_bgwriter;
2022-12-30T10:43:01.248 app[858b7334] fra [info] keeper | 2022-12-30 10:42:52.009 UTC [1086] LOG
2022-12-30T10:43:01.248 app[858b7334] fra [info] using stale statistics instead of current ones because stats collector is not responding
2022-12-30T10:45:31.619 app[858b7334] fra [info] keeper | 2022-12-30 10:45:07.063 UTC [1338] LOG: using stale statistics instead of current ones because stats collector is not responding
2022-12-30T10:47:30.042 app[858b7334] fra [info] sentinel | 2022-12-30T10:47:30.041Z ERROR cmd/sentinel.go:102 election loop error {"error": "Put \"https://consul-fra.fly-shared.net/v1/session/create?wait=5000ms\": dial tcp: lookup consul-fra.fly-shared.net on [fdaa::3]:53: read udp [fdaa:0:7b2e:a7b:23c3:1:583b:2]:44016->[fdaa::3]:53: i/o timeout"}
2022-12-30T10:47:50.953 app[858b7334] fra [info] sentinel | panic: close of closed channel
2022-12-30T10:47:51.014 app[858b7334] fra [info] sentinel |
2022-12-30T10:47:52.942 app[858b7334] fra [info] keeper | 2022-12-30T10:47:52.931Z ERROR cmd/keeper.go:1041 error retrieving cluster data {"error": "Unexpected response code: 500"}
2022-12-30T10:48:07.083 app[858b7334] fra [info] sentinel | 2022-12-30T10:48:07.067Z ERROR cmd/sentinel.go:1843 error retrieving cluster data {"error": "Unexpected response code: 500"}
Is there something wrong with the FRA environment?
For the past 2 hours, we’ve also been facing issues trying to deploy new revisions for deployments in the FRA region. Our deployments fail with the following error message:
Failed due to unhealthy allocations - not rolling back to stable job version xx as current job has same specification.
@jerome sorry for the direct ping, but is there some issues with FRA. My Postgres in production went down for no reasons. I tried this following a comment somewhere else:
But it did not help. Monitoring logs is looping on:
2023-01-19T18:23:50.829 app[e386dd55] fra [info] sentinel | 2023-01-19T18:23:50.829Z WARN cmd/sentinel.go:276 no keeper info available {"db": "6a889cd3", "keeper": "23c313bbd2"}
$ fly pg restart -a hXXXXX-prod-db
Error can't get role for 984e86c6-ad65-8706-a236-14e8754a5203: Get "http://:5500/commands/admin/role": dial: lookup : no such host
There does seem to be a current issue in FRA. That would likely explain some of these. Databases will have a volume, so moving those is presumably more complicated than only-vm apps. Keep an eye on: