My app was not working. After a while I realize the MAD region was deprecated and my attempt to migrate failed (toml file had the configuration, but I failed to scale up).
Now I’m seeing the postgres app isn’t working and I can’t backup from my volume.
I follow this instructions without success.
I see the list of new volumes, but I can’t link the app to the new ones:
blm git:(master) ✗ fly volumes list -a tekclean-db
ID STATE NAME SIZE REGION ZONE ENCRYPTED ATTACHED VM CREATED AT
vol_493km73182nee234* created pg_data 1GB mad 9c7c true 3 weeks ago
vol_re88qk3gqnoo691r created pg_data 1GB lhr 4e23 true 4 hours ago
vol_vpggn6o1mk0y10ev created pg_data 1GB lhr ac54 true 3 hours ago
vol_vwjjomg1mz7dz6mr created pg_data 1GB lhr 6d06 true 3 hours ago
➜ blm git:(master) ✗ fly volumes snapshots list vol_493km73182nee234
Snapshots
ID STATUS SIZE CREATED AT RETENTION DAYS
vs_JbVOZkmg4egXUQ29JV5p created 1073741824 5 days ago 5
vs_XvLZ9Jy2XM2oc6eoy1G26 created 1073741824 6 days ago 5
vs_QeOZ2oqm30mAhb8KxbMY created 1073741824 1 week ago 5
vs_XvLZ9Jy2XM2oc6JaPVXq1 created 1073741824 1 week ago 5
vs_4b0K5k1pgJpBU3waPnGw created 1073741824 1 week ago 5
This is the message I’m seeing:
Some of your apps in MAD region are on a host has suffered irreparable hardware damage. Please migrate your Fly Machines to other hosts and restore volumes from backups. You are not being charged for this resources on this host.
I’m open to pay for support and open an organization plan if you think it will help.
I don’t have a direct answer for your question, but would suggest you copy any snapshots to somewhere safe immediately. We have had people here reporting a problem with their 5-day snapshots 10 days after host failure, and of course they lost all their snapshots.
I believe you can increase the retention days to 30, though out of an abundance of caution, I would take the remote backups first.
I can’t pay for support. I just converted to a pay-as-you-go plan, but it’s only effective starting October 1st, and until then, I’m not allowed to pay for support.
I followed these instructions to restore the snapshot of my DB app to the available snapshots. However, since there is an open incident with the MAD region, I am unable to list the image-ref, which is necessary for the restoration process. I searched for a way to download the snapshot to a local drive, so I could store it, but I couldn’t make it.
I’m a bit stressed because no one likes to lose about 1.5 years of data (my last local pg_dump). I feel bad I didn’t make more dumps.
pg_dump can’t connect to the database, also because of the incident.
Just to confirm, if you do fly m ls -a YOUR_POSTGRES_APP_NAME, are there no rows in that table?
It sounds like this might be because you’re scaling an app based in mad, which no longer allows new machines. It might be worth trying fly scale count --region lhr 1 (it sort of looks like you did that already given you have volumes in lhr).
edit: seems fly scale count is not expected to work for unmanaged PG apps
1 machines have been retrieved from app tekclean-db.
View them in the UI here
tekclean-db
ID NAME STATE CHECKS REGION ROLE IMAGE IP ADDRESS VOLUME CREATED LAST UPDATED PROCESS GROUP SIZE
6e82570b2259e8* young-frost-254 started mad
* These Machines' hosts could not be reached.
! WARNING: There are active host issues affecting your app. Please check `fly incidents hosts list` or visit your app in https://fly.io/dashboard
and yes. For unmanaged PG apss I get:
Thanks
➜ blm git:(master) ✗ fly scale count 1 -a tekclean-db --region lhr
Error: failed to grab app config from existing machines, error: could not create a fly.toml from any machines :-(
No machines configured for this app
Okay, as an alternative, I think you can grab the image details from the GraphQL API. Try the following curl command, replacing YOUR_ORG_SLUG with the org slug for the organisation containing your UPG app, and YOUR_POSTGRES_APP_NAME with your UPG app’s slug:
Follow the instructions here (you linked to this same page earlier, the image ref was seemingly all you were missing to get this to work):
In other words, run:
# --image-ref is from config['image'] in the response you got from the GraphQL API request above
fly postgres create --snapshot-id <snapshot-id> --image-ref flyio/postgres:14.4
I love you. Thanks a million.
I wasn’t able to attach the new postgres to the app, but I could pg_dump it. Nice….
Now I can create an new app because it’s super easy with fly and I’m back on.
If you come to Lisbon, let me know I’ll buy you a coffee
I think all attaching does is set a secret, probably DATABASE_URL, so you might be able to manually attach. Either way, sounds like you’re back on track!
Sorry to be a nag, but it sounds like IMMEDIATELY would be a good time to take some offsite snapshots. And then I think you could bump up your retention days from the rather low value of 5.
I’d also suggest setting up a container that takes a database dump and pushes it to somewhere other than Fly. It really is a good use of your time.