App suddenly cannot connect to postgres db

Hi, my django web app seems to not be able to connect to postgres anymore. Earlier this week it was fine but recently when going to my web url…it is giving me a 500 error. I did not make any changes to the codebase / infrastructure this week.

django error:
django.db.utils.OperationalError: could not translate host name “top2.nearest.of.mtdb-db.internal” to address: No address associated with hostname

Looking at flyctl postgres list
the db seems to be suspended?

Also when trying flyctl postgres restart
I get the message Error no active leader found

flyctl status -a mtdb-db

ID              STATE           ROLE    REGION  HEALTH CHECKS   IMAGE                           CREATED                UPDATED
6e82575fe17387  starting        error   ord     3 total         flyio/postgres:14.4 (v0.0.32)   2022-10-30T20:01:14Z   2022-11-27T08:24:03Z

flyctl checks list -a mtdb-db

  NAME | STATUS  | MACHINE        | LAST UPDATED | OUTPUT
-------*---------*----------------*--------------*--------------------------
  pg   | warning | 6e82575fe17387 | 0s           | the machine is starting
-------*---------*----------------*--------------*--------------------------
  role | warning | 6e82575fe17387 | 0s           | the machine is starting
-------*---------*----------------*--------------*--------------------------
  vm   | warning | 6e82575fe17387 | 0s           | the machine is starting
-------*---------*----------------*--------------*--------------------------

It seems be stuck in the ‘machine is starting’ phase

Any help would be appreciated.
Thanks!

3 Likes

I have this exact issue with a Rails app. I’m not able to find any help on how to resolve it.

1 Like

I have the same issue as well, except I don’t see any machines at all in the database list.

 flyctl checks list -a myapp-db
Health Checks for myapp-db
  NAME | STATUS | ALLOCATION | REGION | TYPE | LAST UPDATED | OUTPUT
-------*--------*------------*--------*------*--------------*---------

Does anyone know how to resolve this?

2 Likes

Run fly status --all. Your DB is likely crashing, which is most frequently due to out of memory issues or disk space issues.

1 Like

Thanks for the quick reply. Now it came up again. Perhaps it was fixed by I restarting it, though it didn’t work immediately.

I seem to have 80% free space left, so not sure what went wrong. Oh well, I hadn’t restarted since October, so maybe it was due.

I am also running into this problem. My database is marked as ‘dead’ and I’ve been unable to restart it.

fly pg restart --config fly.toml --force -a myapp-db
> Error no leader found
flyctl checks list -a myapp-db
> Health Checks for myapp-db
>  NAME | STATUS | ALLOCATION | REGION | TYPE | LAST UPDATED | OUTPUT  
> -------*--------*------------*--------*------*--------------*---------

How can I troubleshoot this?

1 Like

I spent some more time messing around with different fly commands and still no luck.

fly postgres config show -a myapp-db
Error no 6pn ips found for myapp-db app
fly status --app myapp-db         
App
  Name     = myapp-db          
  Owner    = personal                  
  Version  = 0                         
  Status   = dead                      
  Hostname = myapp-db.fly.dev  
  Platform = nomad                     

Instances
ID	PROCESS	VERSION	REGION	DESIRED	STATUS	HEALTH CHECKS	RESTARTS	CREATED 

I can’t seem to get any logs or find any way to restart. Is there another way around this issue? Like maybe starting a new database and transferring the data somehow?

1 Like

Had this issue twice, happened once last year and once again this year.

Last year I had to restart the app, this year it auto fixed itself within an hour. Fly does say the db aren’t managed soooo, yeah.

Thanks for the reply. This is also the second time I’ve run into the issue, the first time it fixed itself as you mention. Your reply led me to find some fly documentation which mentioned the scale command, and running the following resurrected my db:

fly scale count 1 -a myapp-db

It’s still not fully clear to me why it went down in the first place, but it’s back up now. Thanks again!

1 Like