Could not translate host name "top1.nearest.of.***.internal" to address: Name or service not known


I’ve been running a small Django + Postgres app for almost a year without issue. A few days ago, without deploying a new revision or any changes from my end, Django started encountering errors when connecting to the database, which brought the whole application down:

File "/usr/local/lib/python3.10/dist-packages/django/db/backends/postgresql/", line 215, in get_new_connection
connection = Database.connect(**conn_params)
File "/usr/local/lib/python3.10/dist-packages/psycopg2/", line 122, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
django.db.utils.OperationalError: could not translate host name "top1.nearest.of.checkin-db.internal" to address: Name or service not known

The checkin-db app in my account is ok (it was deployed with fly postgres). I can SSH into it and look at the tables just fine. I did notice that the user “Fly Admin Bot” set a new env var in the DB application a few days ago (hard to say if the timing matches): FLY_CONSUL_URL. Could this be related?

Weirdly enough, I’m now unable to SSH into the Django container:

fly ssh console
Connecting to fdaa:0:a484:a7b:7a:609b:5f49:2... complete
Error error connecting to SSH server: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain

And in the Monitoring tab in the dashboard I see:

unexpected error: transient SSH server error: can't resolve _orgcert.internal
unexpected error: [ssh: no auth passed yet, transient SSH server error: can't resolve _orgcert.internal]

Others with similar issues from the last day or so:

1 Like

There was a DNS issue in the Miami region that could have caused these errors, it was fixed a couple of days ago.