DB is suddenly suspended, no way to bring it back up

I’ve been deploying a small Rails app to fly.io. Started having some slow deploys yesterday, but things eventually resolved. Today, deploys are failing because I’m not able to connect to my postgres instance, which is suspended. I don’t know why, can’t see any logs on the instance, and can’t bring it back up. I’ve tried restarting the app to no avail. During the deploy the only error I get is

ActiveRecord::ConnectionNotEstablished: could not translate host name "top2.nearest.of.frosty-bush-477-db.internal" to address: No address associated with hostname

I never had to manually set any host names to get any of this working before—it just all ran itself with no issues. Can anyone help?

Hmm :thinking:

Regarding the hostname, yep, you don’t need to do anything to create that (as you say). Since that .internal name is created by Fly as part of the private network. There are range of .internal domains depending on what you want to do: Private Networking · Fly Docs.

It seems like several people are finding random suspended databases … Do you get any errors when you restart it? Only that was the suggested solution from someone from Fly earlier, in this thread:

1 Like

Thanks for your reply. When I try to restart the postgres instance directly via fly postgres restart frosty-bush-477-db I get Error app frosty-bush-477 is not a postgres appError app frosty-bush-477 is not a postgres app. I’ve also tried to fly image update -a frosty-bush-477-db, but the response I get is what looks like a file editor with some edited lines in it, like so:

`The following changes will be applied to all Postgres machines.
Machines not running the official Postgres image will be skipped.

... // 8 identical lines
		"tty": false
	},
  •   "image": "flyio/postgres:14.4",
    
  •   "image": "registry-1.docker.io/flyio/postgres:14.6",
      "metadata": {
      	"managed-by-fly-deploy": "true"
    
    … // 49 identical lines

? Apply changes? Yes`

When I hit return, I get the following:

Identifying cluster role(s) Machine e286061a6d4286: error Postgres cluster has been successfully updated!

But this doesn’t change anything. DB is still suspended, and my entire site is still down :confused:

1 Like

Ah. That’s not good.

All I can suggest (perhaps you did this and the command got mangled) is that you would need to use the -a flag for the app’s name in the restart command. A subtle difference. Which matters because if you run a fly command from an app’s folder, if you don’t specify an app, it would default to the one in your fly.toml file in that folder. Which I would assume is your rails app, and so that would cause it to complain it’s not a postgres app, and could cause confusion:

$ fly postgres restart --help
Restarts each member of the Postgres cluster one by one. Downtime should be minimal.

Usage:
  flyctl postgres restart [flags]

Flags:
  -a, --app string           Application name

So perhaps first double-check the command is exactly correct, using your postgres app’s name in that flag. If it still does not think it is a postgres app, er, that would be odd! Looking at the docs … it seems there may be a cross-over with legacy vs “Machines”. But if you created this database in the past days, with the latest Fly CLI, that shouldn’t be an issue as all new ones seem to default to the new Machines: Fly Postgres · Fly Docs So I’m not sure what else to suggest.

1 Like

Thanks again for your response. I did what you suggested and made sure I was typing the full, correct command. The response I get now is:

Error no active leader found

So, it appears I am reaching the instance, but that something is really wrong further up the chain. Not good at all. In the meantime my site is still 100% dead.

You’re welcome.

Ok, that solves the “not a postgres app” issue, but yes, no active leader is not good.

Is it out of disk space by any chance? That could explain the issue.

If not, er, have you tried fly checks list -a yourdbapp ? That may show which healthcheck is failing, which could explain why it can’t be restarted or has no leader.

@nickmjones I mentioned this in the other thread, but you need to start the machine back up.

The process to do this is:

fly machines list --app <pg-app-name>

Look for any machines that are in a stopped state.

fly machines start <machine-id> --app <pg-app-name>

I went ahead and did this for you and you’re Postgres should be back up and running.

5 Likes

@shaun Just wanted to thank you for jumping on this. Sincerely appreciate it!

1 Like