YUL region deprecation

I received an email overnight about the YUL region being deprecated and that we need to move our apps to another region.

I’m curious - what happens if I don’t? The app in question is a CI build for a side project. I don’t remember ever picking a region for it, and it’s ended up in YUL somehow. I don’t especially care where it’s deployed, so if the deprecation will automatically move the machine to another region then that’s fine by me.

However, I’ve also just added primary_region = "lhr" to the fly.toml file for the app and redeployed it, and it’s not done anything. I’m not sure exactly what the process would be to move the machines apart from that if I needed to do so?

Cheers

you can move your machines using fly scale:

~ $ fly scale count <ct> --region lhr
~ $ fly scale count 0 --region yul

(replace <ct> by how many machines you run in your app)

we will reach out again later to people still running machines in the region once we have a decommission plan.

2 Likes

Well - that was remarkably easy. Thank you :slight_smile:

Hello,
I followed these instructions, but unfortunately my machine did not restart correctly and my DB is completely down. One of the three checks on it is not passing:

I have tried to completely recreate the app and use a snapshot to restore the DB but this also fails wit the error that the volume is not mountable. I am not sure what to do, do you have anything that could point me to the right direction? Thanks!

@killerkiara those instructions only apply to machines without volumes. (well, they can apply with machines with volumes if your app knows how to deal with a fresh, empty volume when a new machine is created). They do not apply, and will not work, with Postgres applications.

For Postgres, two options:

Safest option

  1. fly postgres create --fork-from old-postgres --region new_region
  2. in the app, fly postgres detach old-postgres
  3. in the app, fly postgres attach new-postgres
  4. confirm everything is working well
  5. fly app delete old-postgres

Risky but possibly no-reconfiguration and less-downtime option

  1. Regional failover to the new region
  2. Delete all Postgres machines in the old region

If you can share the exact command you used to recreate from snapshot and the exact error you got, it might be easier to help figure out what went wrong there.

Regards,

1 Like

Hello, thank you so much for your assistance and for clarifying that.
I have now tried a number of things but so far I was not able to bring the DB back.

Roughly, here is what I did:

  • From the original postgres app, I used the volume where the data is supposed to be latest and good. Got the snapshot from it and run:

fly postgres create --snapshot-id <snapshot-id> --image-ref <image-version> using the snapshot ID I found. This errored with:

Error: failed to launch VM: internal: could not claim volume for machine: volume is not mountable, unable to claim

I think that one issue here was because my volume (and the snapshot I used from it) where still in the old region, while the new app was spinned in the new region.

Secondly, I tried to stay within the same postgres app:

  • I forked the volume
  • I cloned the previous machine and attached the fork to it.

This worked very well and the machine was up and healthy. However when I tried to attach the web app to it, it did not work and I got this error when trying to deploy:

10:50:29.877 [error] Postgrex.Protocol (#PID<0.166.0>) failed to connect: ** (DBConnection.ConnectionError) tcp recv (idle): closed

I am out of ideas, I hope only that the data is not lost :confused:

Thanks a lot for your help!

@killerkiara try fly ssh console to one of your Postgres machines. Once in, you can pg_dump -h /run/postgresql -p 5433 -U postgres. Poke around to check your data is still there. To get it out, pg_dump -h /run/postgresql -p 5433 -U postgres your_database_name > dump.sql. Exit the machine, note the machine ID that you were connected to, and fetch the dump file with something like fly ssh sftp --machine MACHINE_ID get -a your-postgres-app dump.sql.

This is a very standard Postgres dump and recovery procedure, by the way :slight_smile: A good last resort when Postgres seems to be somewhat up but not allowing connections for some reason.

  • Daniel

Hey Daniel,
thanks for your latest suggestion. Unfortunately, this does not work because the two machines on the postgres app are not running and cannot be started, something is very wrong with them. The volumes though are there and should have the data. Is there a way to dump the whole DB directly from one of the volumes? Then I could just create a fresh db app and load the dump there maybe?

Thanks,
Chiara

Apologies if this comes across as facetious - that’s not the intention - but do you have backups from before the migration that can be restored onto a totally clean database? From following this thread, that would seem to be a quicker way to get back to working correctly.

2 Likes

It’s possible that you killed the primary before the replica had a chance to catch up, though, back when you were originally following the fly scale count procedure.

In addition to @grahamcox82’s comment, I would create a pristine volume from one of the snapshots dating before the first attempt at migration. (And then do nothing with that volume itself—not attach any Machines to it—but only work off of forks.) The reason is that snapshots age-off (i.e., auto-delete) after 5 days, and that deadline is fast approaching.

$ fly app create was-yul
$ fly vol create pristine --region ewr --snapshot-id <snapshot-id> --snapshot-retention 60 -a was-yul
$ fly vol snapshots create <volume-id> -a was-yul  # more time.

Hey @grahamcox82, thanks for chipping in. As facetious as it may sounds, there are no (local) backups that I could find… apart from the snapshots. It’s not all too bad, as there is no “real” production data yet, only test data, but a lot of it. So better that I could save it :slight_smile: Anyways, I managed to bring it back now and all is good again.

Thank you!
I managed to fix it all now. Thanks for your help!

1 Like

Maybe this accepted solution should explicitly say that this works for machines without volumes. I followed them and got a lot of trouble. :slight_smile:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.