RESOLVED fixed by redeploying. I found this thread of someone running into the same issue. status.fly.io doesn’t show anything wrong – is there any way for us to set up alerts on fly if things like these happen?
Hi,
approx. 3 hours ago, our Postgres Database suddenly became unavailable through the fly.io DNS.
Our production application is throwing:
2022-07-10T07:36:21Z app[afc1892f] fra [info] Error Can't reach database server at `top2.nearest.of.[db-appname].internal`:`5432`
fly logs -a [db-appname]
shows
2022-07-10T07:40:36Z app[0614d4ac] fra [info]keeper | 2022-07-10T07:40:36.803Z INFO cmd/keeper.go:1504 our db requested role is master
2022-07-10T07:40:36Z app[0614d4ac] fra [info]keeper | 2022-07-10T07:40:36.804Z INFO cmd/keeper.go:1542 already master
2022-07-10T07:40:36Z app[0614d4ac] fra [info]keeper | 2022-07-10T07:40:36.836Z INFO cmd/keeper.go:1675 postgres parameters not changed
2022-07-10T07:40:36Z app[0614d4ac] fra [info]keeper | 2022-07-10T07:40:36.836Z INFO cmd/keeper.go:1702 postgres hba entries not changed
2022-07-10T07:40:36Z app[0614d4ac] fra [info]keeper is healthy, db is healthy, role: master
2022-07-10T07:40:37Z app[0614d4ac] fra [info]configuring operator
2022-07-10T07:40:37Z app[0614d4ac] fra [info]error configuring operator user: can't scan into dest[3]: cannot scan null into *string
2022-07-10T07:40:37Z app[0614d4ac] fra [info]checking stolon status
2022-07-10T07:40:37Z app[0614d4ac] fra [info]keeper is healthy, db is healthy, role: master
2022-07-10T07:40:37Z app[0614d4ac] fra [info]configuring operator
2022-07-10T07:40:37Z app[0614d4ac] fra [info]error configuring operator user: can't scan into dest[3]: cannot scan null into *string
2022-07-10T07:40:38Z app[0614d4ac] fra [info]checking stolon status
2022-07-10T07:40:38Z app[0614d4ac] fra [info]keeper is healthy, db is healthy, role: master
fly ssh console -a [db-appname]
shows
WARN app flag '[db-appname]' does not match app name in config file '[app-appname]'
? Continue using '[db-appname]' Yes
Error host unavailable: host was not found in DNS
fly dig aaaa [db-appname].internal
gives me an empty response:
;; opcode: QUERY, status: NXDOMAIN, id: 25511
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;[db-appname].internal. IN AAAA
fly dig
works for our other apps. The database returns an empty response however.
I’m not sure if this is something on our end, but we haven’t changed anything on our end when things broke. (it was 6am!)