No leader found

dad · November 21, 2022, 8:54pm

Hi,

Running into some postgresql issues, specifically

❯ flyctl postgres restart --config fly/db.toml
Update available 0.0.424 -> 0.0.433.
Run "flyctl version update" to upgrade.
Error no leader found

I’ve tried scaling down and up the postgresql app but that didn’t resolve the problem. Any help would be appreciated.

shaun · November 21, 2022, 8:58pm

Hey there,

Looks like you’re running a pretty old version of flyctl. This is typically the first thing you should do when you’re running into issues.

Could you run fly status --app <app-name> and post the output?

dad · November 21, 2022, 8:59pm

❯ fly status --app diana-backend-db
Update available 0.0.424 -> 0.0.433.
Run "fly version update" to upgrade.
App
  Name     = app_name          
  Owner    = owner                   
  Version  = 6                         
  Status   = running                   
  Hostname = app_name.fly.dev  
  Platform = nomad                     

Instances
ID      	PROCESS	VERSION	REGION	DESIRED	STATUS           	HEALTH CHECKS                 	RESTARTS	CREATED    
399359bb	app    	6      	mia   	run    	running (replica)	3 total, 2 passing, 1 critical	0       	11m55s ago	
94227085	app    	6      	sea   	run    	running (replica)	3 total, 2 passing, 1 critical	0       	11m55s ago

shaun · November 21, 2022, 9:09pm

Ok, so it looks like you have 4 volumes tied to your app. Your primary is likely on the sea volume that’s not currently allocated.

I would scale your app up to 4 and see if your primary comes back.

dad · November 21, 2022, 9:09pm

2 volumes were setup for the web process and 2 volumes for the database.

shaun · November 21, 2022, 9:13pm

Since this is a Postgres App, you should only create volumes that correspond to your database. Volumes that correspond to your web process should live with your Web app.

dad · November 21, 2022, 9:16pm

Sure - I’m not sure what has changed but that was the original setup (2 volumes in the web app and 2 in this Postgres app). This was working fine until 6am or so when I started getting errors. I haven’t created any additional volumes but I did scale down to 0 and back up to 2.

I have a second issue that I tried updating flyctl but half way through brew complained about my Xcode version after removing the previous version of fly. So now I have no flyctl to fix the problem until Xcode finishes installing…

shaun · November 21, 2022, 9:21pm

Got it, so Nomad allocations are not pinned to a specific volume. When you scaled down to 0 and then back up to 2, Nomad chose 2 of the available 4 to allocate at pretty much random. I hope that makes sense!

I have a second issue that I tried updating flyctl but half way through brew complained about my Xcode version after removing the previous version of fly. So now I have no flyctl to fix the problem until Xcode finishes installing…

Yeah, brew takes forever to pick up new versions… Honestly, i’d uninstall brew and re-install via curl:

You’ll get quicker access to new versions and won’t have to mess with brew.

dad · November 21, 2022, 9:24pm

Ok, upping the count to 4 has got the app back online. I’m having issues in syd though

❯ flyctl status --app diana-backend-db
App
  Name     = app_name          
  Owner    = owner                   
  Version  = 7                         
  Status   = running                   
  Hostname = app_name.fly.dev  
  Platform = nomad                     

Instances
ID      	PROCESS	VERSION	REGION	DESIRED	STATUS                	HEALTH CHECKS                 	RESTARTS	CREATED    
b6b7058b	app    	7      	sea   	run    	running (leader)      	3 total, 3 passing            	0       	1m28s ago 	
16564c69	app    	7      	syd   	run    	running (failed to co)	3 total, 1 passing, 2 critical	0       	1m28s ago 	
399359bb	app    	7      	mia   	run    	running (replica)     	3 total, 3 passing            	0       	35m38s ago	
94227085	app    	7      	sea   	run    	running (replica)     	3 total, 3 passing            	0       	35m38s ago

dad · November 21, 2022, 9:30pm

Seem to be resolved now.

dad · November 21, 2022, 9:36pm

@shaun I understand that fly doesn’t provide hosted postgresql but is there a recommended way to monitor fly Postgres applications for “issues”? Even if it’s a combination of dashboards or cobbled together flyctl commands.

shaun · November 21, 2022, 9:50pm

You have a few options:

We provide some basic health checks that you can monitor with:
fly checks list --app <app-name>

For a high level overview you can reach for:
fly status --app <app-name>

If you want a more granular look at how things are performing at the resource level, we provide a Grafana dashboard:

fly dashboard --app <app-name>
Click metrics
In the top-right corner you’ll see “Open in Grafana”.

We also expose PG metrics, so you can always hook them up to your own monitoring service.

github.com

fly-apps/postgres-ha/blob/main/fly.toml#L46-L48


      
          [metrics]
            path = "/metrics"
            port = 9187

dad · November 21, 2022, 10:07pm

Thank you!

lucasmanual · October 18, 2023, 7:10pm

Following up on this for others in this situation.

fly postgres list
NAME OWNER STATUS LATEST DEPLOY
wwwhww abc deployed

then

fly postgres restart -a wwwhww
Error: no active leader found

fly checks list --app wwwhww
Health Checks for wwwhww
NAME | STATUS | MACHINE | LAST UPDATED | OUTPUT
---------------------------------------------------------------------------------------------------------------------------------
pg | critical | 4d89040f0d1387 | 2023-10-09T06:41:50Z | 500 Internal Server Error
| | | | failed to connect to proxy: context deadline exceeded
…
fly machine restart 4d89040f0d1387 -a wwwhww
Restarting machine 4d89040f0d1387
Waiting for 4d89040f0d1387 to become healthy (started, 3/3)
Machine 4d89040f0d1387 restarted successfully!

I think there should be some kind of warning system from fly.io that the pg just went from:
“leader” to “error”,
Since fly.io knows the current status, why do I need to duplicate effort of montoring the same instance assuming it responds. I imagine since fly knows the status, just pass the message. I think other cloud providers do that with few clicks.

ID STATE ROLE REGION CHECKS IMAGE CREATED UPDATED
4d89040f0d1387 started leader ord 3 total, 3 passing flyio/postgres:14 (v0.0.32) 2022-11-03T18:23:37Z 2023-10-18T18:51:41Z

Thanks
Lucas

Topic		Replies	Views
Postgres is down, cannot restart. No active leader found postgres	22	5427	January 15, 2025
Database Suspended, No Leader Found, Volume missing postgres , volumes	9	63	February 6, 2025
Postgres database down after image update to 14.6 Questions / Help postgres	4	496	January 19, 2023
Both postgres instances are replica	1	284	March 5, 2022
Postgres app is dead Questions / Help postgres	2	396	November 24, 2022

No leader found

Related topics