Postgres is down, cannot restart. Error no active leader found.

Hi fly.io community,

I migrated from Heroku back in August. Everything was working really well until recently when my postgres server just stopped working (without any prompting from me, I think).

The machine for my postgres server seems to be stuck in “starting” as far as I can tell.

When I try to connect:

$ flyctl postgres connect -a rep-db
Error no active leader found

When I try to restart the machine:

$ flyctl machine restart 32871e1f692e85 -a rep-db
Restarting machine 32871e1f692e85
failed to release lease for machine 32871e1f692e85: lease not foundError failed to restart machine 32871e1f692e85: could not stop machine 32871e1f692e85: failed to restart VM 32871e1f692e85: failed to wait for machine to be started

What is my next step? I noticed in the fly.io postgres documentation that as of flyctl version v0.0.412, the postgres clusters are created using “next-gen Apps V2 architecture, built on Fly Machines” instead of on Nomad architecture.

Is this the root of my issue? I’m using flyctl version 0.0.435 (fly v0.0.435 darwin/amd64 Commit: c5149629 BuildDate: 2022-11-22T16:36:15Z)

Thank you

In case this is helpful, when I try to deploy my Django application, I get this error:

--> You can detach the terminal anytime without stopping the deployment
==> Release command detected: python manage.py migrate

--> This release will not be available until the release command succeeds.
	 Starting instance
	 Configuring virtual machine
	 Unpacking image
	 Preparing kernel init
	 UUID=c3b8aabc-c66f-40d3-bc0c-900018c3ae63
	 Preparing to run: `python manage.py migrate` as root
	 2022/11/28 02:52:54 listening on [fdaa:0:88d8:a7b:e770:b2b0:2e09:2]:22 (DNS: [fdaa::3]:53)
	 Traceback (most recent call last):
	     return func(*args, **kwargs)
	            ^^^^^^^^^^^^^^^^^^^^^
	   File "/usr/local/lib/python3.1
	     connection = Database.connect(**conn_params)
	 psycopg2.OperationalError: could not translate host name "top2.nearest.of.rep-db.internal" to address: Name does not resolve
	 Traceback (most recent call last):
	   File "/usr/local/lib/python3.11/site-packages/django/core/management/base.py", line 354, in run_from_argv
	   File "/usr/local/lib/python3.11/site-packages/django/core/management/base.py", line 398, in execute
	   File "/usr/local/lib/python3.11/site-packages/django/core/management/base.py", line 89, in wrapped
	           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	   File "/usr/local/lib/python3.11/site-packages/django/core/management/commands/migrate.py", line 75, in handle
	   File "/usr/local/lib/python3.11/site-packages/django/core/management/base.py", line 419, in check
	     all_issues = checks.run_checks(
	      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	   File "/usr/local/lib/python3.11/site-packages/django/db/models/base.py", line 1682, in _check_indexes
	     connection.features.supports_covering_indexes or
	     res = instance.__dict__[self.name] = self.func(instance)
	     return next(self.gen)
	     with self.cursor() as cursor:
	     return func(*args, **kwargs)
	            ^^^^^^^^^^^^^^^^^^^^^
	     return func(*args, **kwargs)
	                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	   File "/usr/local/lib/python3.11/site-packages/psycopg2/__init__.py", line 122, in connect
	     conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
	 django.db.utils^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

jango.db.utils.OperationalError: could not translate host name "top2.nearest.of.rep-db.internal" to address: Name does not resolve
	 Starting clean up.

rep-db is the name of the application for my postgres instance

I’ve also restore a snapshot into a 2nd postgres instance which seems to be doing just fine (and thankfully has all the data I need). However, I cannot detach the original instance that is broken. I get the same Error no active leader found error message:

$ fly postgres detach rep-db
Error no active leader found

I can’t attach the new db instance before detaching the current one.

Same issue here.

After seeing this in the documentation:

This Is Not Managed Postgres

Before you use Fly Postgres, here are some things worth understanding about it:

Fly Postgres is a regular app you deploy on Fly.io, with an automated creation process and some platform integration to simplify management. It relies on building blocks available to all Fly apps, like flyctl, volumes, private networking, health checks, logs, metrics, and more. The source code is available on GitHub to view and fork.

This is not a managed database. If Postgres crashes because it ran out of memory or disk space, you’ll need to do a little work to get it back.

I realized what I actually need & want is a managed Postgres service. Using the 2nd instance of Postgres based on the snapshot, I migrated my Postgres database to another managed provider.

This sounds like the instance might have run out of disk space. If you run into an issue like this in the future, run fly checks list. This should tell you what’s actually failing on the Postgres instance.

With a fresh postgres deployment and absolute no use, I get this error.

flyctl postgres connect -a correct_db_name
=> Error no active leader found

Hard to understand what’s going on.

I have the same issue. I saw a restart via the console (“by Fly Admin Bot 4 days ago”) as well as setting a new secret for FLY_CONSUL_URL.

$ flyctl pg restart -a myapp
Error no active leader found
$ flyctl checks -c myapp.toml
Health Checks for myapp
  NAME | STATUS | MACHINE | LAST UPDATED | OUTPUT
-------*--------*---------*--------------*---------
$ flyctl ping myapp.internal
Error get app: Could not find App