Hey all!
Starting today (but only noticed a moment ago!) my app has been throwing this error:
2023-01-31T23:36:56.863 app[1fe46360] sea [info] Error: getaddrinfo ENOTFOUND <hostname>-db.internal
2023-01-31T23:36:56.863 app[1fe46360] sea [info] at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:109:26) {
2023-01-31T23:36:56.863 app[1fe46360] sea [info] errno: -3007,
2023-01-31T23:36:56.863 app[1fe46360] sea [info] code: 'ENOTFOUND',
2023-01-31T23:36:56.863 app[1fe46360] sea [info] syscall: 'getaddrinfo',
2023-01-31T23:36:56.863 app[1fe46360] sea [info] hostname: '<hostname>-db.internal'
2023-01-31T23:36:56.863 app[1fe46360] sea [info] }
It looks like the postgres db is unreachable, and I can’t proxy it from CLI either. It’s odd since I haven’t changed a single thing, and it was working fine last night. I did do a restart just now to see if it would fix the issue but alas it did not.
I don’t see any indication of trouble on the status page, so I’m not sure what the problem is. Curious if this has happened to anyone else and what they’ve done to troubleshoot?
2 Likes
FWIW the error on the database app is this:
2023-01-31T09:02:15.341 app[908010eb6e6d87] sea [info] sentinel | 2023-01-31T09:02:15.340Z ERROR cmd/sentinel.go:102 election loop error {"error": "Put \"https://consul-na.fly-shared.net/v1/session/create?wait=5000ms\": dial tcp: lookup consul-na.fly-shared.net on [fdaa::3]:53: read udp [fdaa:0:d90d:a7b:2dbb:93d2:d0b2:2]:49023->[fdaa::3]:53: i/o timeout"}
2023-01-31T23:53:35.251 app[908010eb6e6d87] sea [info] keeper | 2023-01-31T23:53:35.245Z ERROR cmd/keeper.go:1041 error retrieving cluster data {"error": "Unexpected response code: 500"}
2023-01-31T23:53:35.570 app[908010eb6e6d87] sea [info] sentinel | 2023-01-31T23:53:35.567Z ERROR cmd/sentinel.go:102 election loop error {"error": "Unexpected response code: 500 (rpc error getting client: failed to get conn: rpc error: lead thread didn't get connection)"}
2023-01-31T23:53:35.966 app[908010eb6e6d87] sea [info] sentinel | 2023-01-31T23:53:35.965Z ERROR cmd/sentinel.go:1889 cannot update sentinel info {"error": "cannot set or renew session for ttl, unable to operate on sessions"}
2023-01-31T23:53:36.119 app[908010eb6e6d87] sea [info] keeper | 2023-01-31T23:53:36.118Z ERROR cmd/keeper.go:870 failed to update keeper info {"error": "cannot set or renew session for ttl, unable to operate on sessions"}
2023-01-31T23:53:46.332 app[908010eb6e6d87] sea [info] keeper | 2023-01-31T23:53:46.331Z ERROR cmd/keeper.go:1041 error retrieving cluster data {"error": "Unexpected response code: 500"}
2023-01-31T23:53:48.340 app[908010eb6e6d87] sea [info] sentinel | 2023-01-31T23:53:48.338Z ERROR cmd/sentinel.go:1852 error retrieving cluster data {"error": "Unexpected response code: 500"}
2023-01-31T23:54:06.864 app[908010eb6e6d87] sea [info] sentinel | panic: close of closed channel
2023-01-31T23:54:06.864 app[908010eb6e6d87] sea [info] sentinel |
2023-01-31T23:54:06.864 app[908010eb6e6d87] sea [info] sentinel | goroutine 31981 [running]:
2023-01-31T23:54:06.866 app[908010eb6e6d87] sea [info] sentinel | github.com/superfly/leadership.(*Candidate).initLock(0xc00010bb90)
2023-01-31T23:54:06.866 app[908010eb6e6d87] sea [info] sentinel | /go/pkg/mod/github.com/superfly/leadership@v0.2.1/candidate.go:98 +0x2e
2023-01-31T23:54:06.866 app[908010eb6e6d87] sea [info] sentinel | github.com/superfly/leadership.(*Candidate).campaign(0xc00010bb90)
2023-01-31T23:54:06.866 app[908010eb6e6d87] sea [info] sentinel | /go/pkg/mod/github.com/superfly/leadership@v0.2.1/candidate.go:124 +0xc6
2023-01-31T23:54:06.866 app[908010eb6e6d87] sea [info] sentinel | created by github.com/superfly/leadership.(*Candidate).RunForElection
2023-01-31T23:54:06.867 app[908010eb6e6d87] sea [info] sentinel | /go/pkg/mod/github.com/superfly/leadership@v0.2.1/candidate.go:60 +0xc5
2023-01-31T23:54:06.871 app[908010eb6e6d87] sea [info] sentinel | exit status 2
2023-01-31T23:54:06.871 app[908010eb6e6d87] sea [info] sentinel | restarting in 3s [attempt 1]
2023-01-31T23:54:09.869 app[908010eb6e6d87] sea [info] sentinel | Running...
I tried restarting the app from the CLI
flyctl apps restart <hostname>-db
→ Error postgres apps should use
fly pg restart instead
Fair enough, so I tried…
flyctl pg restart -a <hostname>-db
Identifying cluster role(s)
Machine 908010eb6e6d87: leader
Restarting machine 908010eb6e6d87
Waiting for 908010eb6e6d87 to become healthy (started, 3/3)
Machine 908010eb6e6d87 restarted successfully!
Postgres cluster has been successfully restarted!
failed to release lease for machine 908010eb6e6d87: lease not found
And here were the logs from fly:
2023-02-01T02:52:47.455 app[908010eb6e6d87] sea [info] keeper | Running...
2023-02-01T02:52:47.686 app[908010eb6e6d87] sea [info] proxy | [WARNING] 031/025247 (563) : bk_db/pg1 changed its IP from (none) to fdaa:0:d90d:a7b:2dbb:93d2:d0b2:2 by flydns/dns1.
2023-02-01T02:52:47.701 app[908010eb6e6d87] sea [info] exporter | INFO[0000] Established new database connection to "fdaa:0:d90d:a7b:2dbb:93d2:d0b2:2:5433". source="postgres_exporter.go:970"
2023-02-01T02:52:55.855 app[908010eb6e6d87] sea [info] keeper | 2023-02-01T02:52:55.854Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2023-02-01T02:52:59.055 app[908010eb6e6d87] sea [info] keeper | 2023-02-01 02:52:59.054 UTC [592] LOG: database system is ready to accept connections
2023-02-01T02:53:02.728 app[908010eb6e6d87] sea [info] exporter | INFO[0015] Semantic Version Changed on "fdaa:0:d90d:a7b:2dbb:93d2:d0b2:2:5433": 0.0.0 -> 14.4.0 source="postgres_exporter.go:1539"
And it works now, all the data still there! Not sure what happened but I’m pretty sure it wasn’t anything I did.
1 Like