Hello dear community,
My app went down and I went in to check why. Seems, db server is unreachable, error: no active leader found. I did some investigation and found out the “image” needs to be updated.
This is what I get when I check the status of the DB app:
Updates available:
Machine "e784e2e0" flyio/postgres-flex:15.3 (v0.0.42) -> flyio/postgres-flex:15.3 (v0.0.45)
Run `flyctl image update` to migrate to the latest image version.
ID STATE ROLE REGION CHECKS IMAGE CREATED UPDATED
e784e2e0 started error sea 3 total, 1 passing, 2 critical flyio/postgres-flex:15.3 (v0.0.42) 2023-07-02T06:18:23Z 2023-10-08T16:25:19Z
I tried running “image update” and I get the following response:
The following changes will be applied to all Postgres machines.
Machines not running the official Postgres image will be skipped.
... // 85 identical lines
}
},
- "image": "flyio/postgres-flex:15.3@sha256:c380a6108f9f49609d64e5e83a3117397ca3b5c3202d0bf0996883ec3d",
+ "image": "registry-1.docker.io/flyio/postgres-flex:15.3@sha256:5e5fc53decb051f69b0850f0f5d137c92343fcd1131ec413015e526062",
"restart": {
"policy": "on-failure",
... // 8 identical lines
? Apply changes? Yes
Identifying cluster role(s)
Machine e784e2e0: error
Postgres cluster has been successfully updated!
But nothing works. Restarting, stopping, scaling, redeploying, connecting… nothing.
Health Checks for basira-website-db
NAME | STATUS | MACHINE | LAST UPDATED | OUTPUT
-------*----------*----------------*--------------*----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pg | critical | e784e2e0c47198 | 48m49s ago | 500 Internal Server Error
| | | | failed to connect with local node: failed to connect to `host=fdaa:2:70e4:a7b:f9:54c9:92f8:2 user=flypgadmin database=postgres`: dial error (dial tcp [fdaa:2:70e4:a7b:f9:54c9:92f8:2]:5433: connect: connection refused)
-------*----------*----------------*--------------*----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
role | critical | e784e2e0c47198 | 47m55s ago | 500 Internal Server Error
| | | | failed to connect to local node: failed to connect to `host=fdaa:2:70e4:a7b:f9:54c9:92f8:2 user=repmgr database=repmgr`: dial error (dial tcp [fdaa:2:70e4:a7b:f9:54c9:92f8:2]:5433: connect: connection refused)
-------*----------*----------------*--------------*----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
vm | passing | e784e2e0c47198 | 47m47s ago | [✓] checkDisk: 864.23 MB (87.7%) free space on /data/ (88.48µs)
| | | | [✓] checkLoad: load averages: 0.00 0.00 0.00 (82.78µs)
| | | | [✓] memory: system spent 0s of the last 60s waiting on memory (66.44µs)
| | | | [✓] cpu: system spent 144ms of the last 60s waiting on cpu (51.75µs)
| | | | [✓] io: system spent 198ms of the last 60s waiting on io (29.96µs)
-------*----------*----------------*--------------*----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
I’m a big dumdum and I did not backup in over 2 months, and I need the data, so if possible, I’d like to keep it.
Edit: machine id is ‘e784e2e0c47198’.
Thank you.
Resources: thank you Eric!