No leader found

Hi,

Running into some postgresql issues, specifically

❯ flyctl postgres restart --config fly/db.toml
Update available 0.0.424 -> 0.0.433.
Run "flyctl version update" to upgrade.
Error no leader found

I’ve tried scaling down and up the postgresql app but that didn’t resolve the problem. Any help would be appreciated.

Hey there,

Looks like you’re running a pretty old version of flyctl. This is typically the first thing you should do when you’re running into issues.

Could you run fly status --app <app-name> and post the output?

❯ fly status --app diana-backend-db
Update available 0.0.424 -> 0.0.433.
Run "fly version update" to upgrade.
App
  Name     = app_name          
  Owner    = owner                   
  Version  = 6                         
  Status   = running                   
  Hostname = app_name.fly.dev  
  Platform = nomad                     

Instances
ID      	PROCESS	VERSION	REGION	DESIRED	STATUS           	HEALTH CHECKS                 	RESTARTS	CREATED    
399359bb	app    	6      	mia   	run    	running (replica)	3 total, 2 passing, 1 critical	0       	11m55s ago	
94227085	app    	6      	sea   	run    	running (replica)	3 total, 2 passing, 1 critical	0       	11m55s ago	

Ok, so it looks like you have 4 volumes tied to your app. Your primary is likely on the sea volume that’s not currently allocated.

I would scale your app up to 4 and see if your primary comes back.

2 volumes were setup for the web process and 2 volumes for the database.

Since this is a Postgres App, you should only create volumes that correspond to your database. Volumes that correspond to your web process should live with your Web app.

Sure - I’m not sure what has changed but that was the original setup (2 volumes in the web app and 2 in this Postgres app). This was working fine until 6am or so when I started getting errors. I haven’t created any additional volumes but I did scale down to 0 and back up to 2.

I have a second issue that I tried updating flyctl but half way through brew complained about my Xcode version after removing the previous version of fly. So now I have no flyctl to fix the problem until Xcode finishes installing…

Got it, so Nomad allocations are not pinned to a specific volume. When you scaled down to 0 and then back up to 2, Nomad chose 2 of the available 4 to allocate at pretty much random. I hope that makes sense!

I have a second issue that I tried updating flyctl but half way through brew complained about my Xcode version after removing the previous version of fly. So now I have no flyctl to fix the problem until Xcode finishes installing…

Yeah, brew takes forever to pick up new versions… Honestly, i’d uninstall brew and re-install via curl:

You’ll get quicker access to new versions and won’t have to mess with brew.

Ok, upping the count to 4 has got the app back online. I’m having issues in syd though

❯ flyctl status --app diana-backend-db
App
  Name     = app_name          
  Owner    = owner                   
  Version  = 7                         
  Status   = running                   
  Hostname = app_name.fly.dev  
  Platform = nomad                     

Instances
ID      	PROCESS	VERSION	REGION	DESIRED	STATUS                	HEALTH CHECKS                 	RESTARTS	CREATED    
b6b7058b	app    	7      	sea   	run    	running (leader)      	3 total, 3 passing            	0       	1m28s ago 	
16564c69	app    	7      	syd   	run    	running (failed to co)	3 total, 1 passing, 2 critical	0       	1m28s ago 	
399359bb	app    	7      	mia   	run    	running (replica)     	3 total, 3 passing            	0       	35m38s ago	
94227085	app    	7      	sea   	run    	running (replica)     	3 total, 3 passing            	0       	35m38s ago

Seem to be resolved now.

1 Like

@shaun I understand that fly doesn’t provide hosted postgresql but is there a recommended way to monitor fly Postgres applications for “issues”? Even if it’s a combination of dashboards or cobbled together flyctl commands.

You have a few options:

We provide some basic health checks that you can monitor with:
fly checks list --app <app-name>

For a high level overview you can reach for:
fly status --app <app-name>

If you want a more granular look at how things are performing at the resource level, we provide a Grafana dashboard:

  1. fly dashboard --app <app-name>
  2. Click metrics
  3. In the top-right corner you’ll see “Open in Grafana”.

We also expose PG metrics, so you can always hook them up to your own monitoring service.

Thank you!

Following up on this for others in this situation.

fly postgres list
NAME OWNER STATUS LATEST DEPLOY
wwwhww abc deployed

then

fly postgres restart -a wwwhww
Error: no active leader found

fly checks list --app wwwhww
Health Checks for wwwhww
NAME | STATUS | MACHINE | LAST UPDATED | OUTPUT
---------------------------------------------------------------------------------------------------------------------------------
pg | critical | 4d89040f0d1387 | 2023-10-09T06:41:50Z | 500 Internal Server Error
| | | | failed to connect to proxy: context deadline exceeded

fly machine restart 4d89040f0d1387 -a wwwhww
Restarting machine 4d89040f0d1387
Waiting for 4d89040f0d1387 to become healthy (started, 3/3)
Machine 4d89040f0d1387 restarted successfully!

I think there should be some kind of warning system from fly.io that the pg just went from:
“leader” to “error”,
Since fly.io knows the current status, why do I need to duplicate effort of montoring the same instance assuming it responds. I imagine since fly knows the status, just pass the message. I think other cloud providers do that with few clicks.

ID STATE ROLE REGION CHECKS IMAGE CREATED UPDATED
4d89040f0d1387 started leader ord 3 total, 3 passing flyio/postgres:14 (v0.0.32) 2022-11-03T18:23:37Z 2023-10-18T18:51:41Z

Thanks
Lucas