region deprecated - empty postgresql backups

My app was not working. After a while I realize the MAD region was deprecated and my attempt to migrate failed (toml file had the configuration, but I failed to scale up).

Now I’m seeing the postgres app isn’t working and I can’t backup from my volume.
I follow this instructions without success.

I see the list of new volumes, but I can’t link the app to the new ones:

blm git:(master) ✗ fly volumes list -a tekclean-db
ID                   	STATE  	NAME   	SIZE	REGION	ZONE	ENCRYPTED	ATTACHED VM	CREATED AT
vol_493km73182nee234*	created	pg_data	1GB 	mad   	9c7c	true     	           	3 weeks ago
vol_re88qk3gqnoo691r 	created	pg_data	1GB 	lhr   	4e23	true     	           	4 hours ago
vol_vpggn6o1mk0y10ev 	created	pg_data	1GB 	lhr   	ac54	true     	           	3 hours ago
vol_vwjjomg1mz7dz6mr 	created	pg_data	1GB 	lhr   	6d06	true     	           	3 hours ago
➜  blm git:(master) ✗ fly volumes snapshots list vol_493km73182nee234
Snapshots
ID                      	STATUS 	SIZE      	CREATED AT	RETENTION DAYS
vs_JbVOZkmg4egXUQ29JV5p 	created	1073741824	5 days ago	5
vs_XvLZ9Jy2XM2oc6eoy1G26	created	1073741824	6 days ago	5
vs_QeOZ2oqm30mAhb8KxbMY 	created	1073741824	1 week ago	5
vs_XvLZ9Jy2XM2oc6JaPVXq1	created	1073741824	1 week ago	5
vs_4b0K5k1pgJpBU3waPnGw 	created	1073741824	1 week ago	5

This is the message I’m seeing:
Some of your apps in MAD region are on a host has suffered irreparable hardware damage. Please migrate your Fly Machines to other hosts and restore volumes from backups. You are not being charged for this resources on this host.

I’m open to pay for support and open an organization plan if you think it will help.

I don’t have a direct answer for your question, but would suggest you copy any snapshots to somewhere safe immediately. We have had people here reporting a problem with their 5-day snapshots 10 days after host failure, and of course they lost all their snapshots.

I believe you can increase the retention days to 30, though out of an abundance of caution, I would take the remote backups first.

Thanks for the answer and time.

I can’t pay for support. I just converted to a pay-as-you-go plan, but it’s only effective starting October 1st, and until then, I’m not allowed to pay for support.

I followed these instructions to restore the snapshot of my DB app to the available snapshots. However, since there is an open incident with the MAD region, I am unable to list the image-ref, which is necessary for the restoration process. I searched for a way to download the snapshot to a local drive, so I could store it, but I couldn’t make it.

I’m a bit stressed because no one likes to lose about 1.5 years of data (my last local pg_dump). I feel bad I didn’t make more dumps.

pg_dump can’t connect to the database, also because of the incident.

At least I still see this message:

We’re addressing an incident that affects one or more of your apps. Please check the status page for more details.
and
You don’t have to wait: Learn how to recover and move away from an unresponsive host.
Although I couldn’t make the latest one to work.

Just to confirm, if you do fly m ls -a YOUR_POSTGRES_APP_NAME, are there no rows in that table?

It sounds like this might be because you’re scaling an app based in mad, which no longer allows new machines. It might be worth trying fly scale count --region lhr 1 (it sort of looks like you did that already given you have volumes in lhr).

edit: seems fly scale count is not expected to work for unmanaged PG apps

1 machines have been retrieved from app tekclean-db.
View them in the UI here

tekclean-db
ID             	NAME           	STATE  	CHECKS	REGION	ROLE	IMAGE	IP ADDRESS	VOLUME	CREATED	LAST UPDATED	PROCESS GROUP	SIZE
6e82570b2259e8*	young-frost-254	started	      	mad

* These Machines' hosts could not be reached.
! WARNING: There are active host issues affecting your app. Please check `fly incidents hosts list` or visit your app in https://fly.io/dashboard

and yes. For unmanaged PG apss I get:
Thanks :raising_hands:

➜  blm git:(master) ✗ fly scale count 1 -a tekclean-db --region lhr
Error: failed to grab app config from existing machines, error: could not create a fly.toml from any machines :-(
No machines configured for this app

Okay, as an alternative, I think you can grab the image details from the GraphQL API. Try the following curl command, replacing YOUR_ORG_SLUG with the org slug for the organisation containing your UPG app, and YOUR_POSTGRES_APP_NAME with your UPG app’s slug:

curl 'https://api.fly.io/graphql' \
    -H 'Content-Type: application/json' \
    -H "authorization: $(fly tokens create readonly YOUR_ORG_SLUG)" \
    -d '{"query":"query { app(name:\"YOUR_POSTGRES_APP_NAME\") {machines{edges{node{config}}}}}"}'

edit: corrected the error mentioned in post below

2 Likes

I needed to use my organisation slug for the tokens create line.

output:

{
  "data": {
    "app": {
      "machines": {
        "edges": [
          {
            "node": {
              "config": {
                "env": {
                  "PRIMARY_REGION": "mad"
                },
                "init": {
                  "cmd": [],
                  "tty": false,
                  "exec": [],
                  "entrypoint": []
                },
                "size": "shared-cpu-1x",
                "image": "flyio/postgres:14.4",
                "image_ref": {
                  "registry": "docker-hub-mirror.fly.io",
                  "repository": "flyio/postgres",
                  "tag": "14.4",
                  "digest": "sha256:0aa49da884754d5e46d3d465738a20952d86f75970bc2fd0ef3f654669b9f2e4",
                  "labels": {
                    "fly.version": "v0.0.33",
                    "fly.app_role": "postgres_cluster",
                    "fly.pg-version": "14.4-1.pgdg110+1"
                  }
                },
                "restart": {
                  "policy": "RESTART_POLICY_ALWAYS",
                  "max_retries": 0,
                  "gpu_bid_price": 0
                },
                "metadata": {
                  "managed-by-fly-deploy": "true"
                },
                "mounts": []
              }
            }
          }
        ]
      }
    }
  }
}

The image-ref kind of gives me a clue, but I’m not sure yet what would be the next step :thinking:

Thanks again for taking the time to take a look at this.

Oops sorry yes.

Follow the instructions here (you linked to this same page earlier, the image ref was seemingly all you were missing to get this to work):

In other words, run:

# --image-ref is from config['image'] in the response you got from the GraphQL API request above
fly postgres create --snapshot-id <snapshot-id> --image-ref flyio/postgres:14.4

using the relevant snapshot ID

I love you. Thanks a million.
I wasn’t able to attach the new postgres to the app, but I could pg_dump it. Nice….
Now I can create an new app because it’s super easy with fly and I’m back on.

If you come to Lisbon, let me know I’ll buy you a coffee :smiley:

1 Like

I think all attaching does is set a secret, probably DATABASE_URL, so you might be able to manually attach. Either way, sounds like you’re back on track!

Sorry to be a nag, but it sounds like IMMEDIATELY would be a good time to take some offsite snapshots. :squinting_face_with_tongue: And then I think you could bump up your retention days from the rather low value of 5.

I’d also suggest setting up a container that takes a database dump and pushes it to somewhere other than Fly. It really is a good use of your time.

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.