Postgres is down, cannot restart. Error no active leader found.

For anyone interested I wasn’t able to track down what was wrong but I did resolve the problem.

I initially tried restarting postgres but the command errored:

❯ flyctl pg restart --config fly/db.toml
Update available 0.0.463 -> 0.0.473.
Run "flyctl version update" to upgrade.
Error no active leader found

I tried upgrading the image hoping it would force a restart but unfortunately nope:

❯ flyctl image update --config fly/db.toml 
Update available 0.0.463 -> 0.0.474.
Run "flyctl version update" to upgrade.
The following changes will be applied to all Postgres machines.
Machines not running the official Postgres image will be skipped.

  	... // 3 identical lines
  		},
  		"init": {},
- 		"image": "flyio/postgres:14.6",
+ 		"image": "registry-1.docker.io/flyio/postgres:14.6@sha256:9cfb3fafcc1b9bc2df7c901d2ae4a81e83ba224bfe79b11e4dc11bb1838db46e",
  		"metadata": {
  			"fly-managed-postgres": "true",
  	... // 46 identical lines
  	
? Apply changes? Yes
Identifying cluster role(s)
  Machine 73d8d3d6a72389: error
Postgres cluster has been successfully updated!

I think thought I’d try and scale down and then back up - I tried scaling but that errored:

❯ flyctl scale count 2 --config fly/db.toml 
Update available 0.0.463 -> 0.0.474.
Run "flyctl version update" to upgrade.
Error it looks like your app is running on v2 of our platform, and does not support this legacy command: try running fly machine clone instead

The v2 platform doesn’t seem to support scaling but there was a restart command:

❯ flyctl machine restart 73d8d3d6a72389 --config fly/db.toml              
Update available 0.0.463 -> 0.0.474.
Run "flyctl version update" to upgrade.
Restarting machine 73d8d3d6a72389
  Waiting for 73d8d3d6a72389 to become healthy (started, 3/3)
Machine 73d8d3d6a72389 restarted successfully!

And we’re back to being healthy:

❯ flyctl checks list --config fly/db.toml                   
Update available 0.0.463 -> 0.0.474.
Run "flyctl version update" to upgrade.
Health Checks for solitary-sun-2613
  NAME | STATUS  | MACHINE        | LAST UPDATED         | OUTPUT                                                                   
-------*---------*----------------*----------------------*--------------------------------------------------------------------------
  pg   | passing | 73d8d3d6a72389 | 54s ago              | [✓] transactions: read/write (245.12µs)                                  
       |         |                |                      | [✓] connections: 13 used, 3 reserved, 300 max (5.43ms)                   
-------*---------*----------------*----------------------*--------------------------------------------------------------------------
  role | passing | 73d8d3d6a72389 | 57s ago              | leader                                                                   
-------*---------*----------------*----------------------*--------------------------------------------------------------------------
  vm   | passing | 73d8d3d6a72389 | 2023-02-23T11:08:33Z | [✓] checkDisk: 827.39 MB (84.8%) free space on /data/ (60.61µs)          
       |         |                |                      | [✓] checkLoad: load averages: 0.05 0.16 0.31 (109.21µs)                  
       |         |                |                      | [✓] memory: system spent 0s of the last 60s waiting on memory (37.74µs)  
       |         |                |                      | [✓] cpu: system spent 5.75s of the last 60s waiting on cpu (23.74µs)     
       |         |                |                      | [✓] io: system spent 60ms of the last 60s waiting on io (22.24µs)        
-------*---------*----------------*----------------------*-------------------------------------------------------------------------

I don’t know why this fixed the problem or what the problem was but it is now resolved.

7 Likes