fly migrate-to-v2 - postgres edition

Last week we announced the new fly migrate-to-v2 command. This is the first in a series of updates which aims to make migrate-to-v2 more capable of migrating all kinds of apps. As of flyctl v0.0.510 migrate-to-v2 should Just Work :tm: with your v1 postgres apps. That said I’m sure bugs will be found so please report them when you find them.

How it works

It works just like migrating a regular app with some exceptions

  • It will upgrade your pg image first
  • It sets your pg to readonly
  • It will create new volumes (one for each of your existing ones)
  • It will wait for at least one new replica from your PRIMARY_REGION is fully synced before scaling down the old nomad vms
  • It disables readonly

Caveats

  • The migrator does not automatically delete the old volumes so you will have to delete them yourself if you want to prevent from being charged
  • Sometimes, you can end up with a machine that has a release version 0.
    * This is a cosmetic bug, the machine still runs the correct image and has the right configuration.
  • The process will begin whether or not payment is configured, but it cannot finish without a valid payment method.
    * Don’t worry, though! When this happens, your app will be rolled back to how it was before migration was attempted, so nothing breaks.
  • Rarely, the error recovery rollback process leaves your app in a “suspended” state. This is a consequence of the app being in-between nomad and machines for a brief period. This has to be fixed on our backend, but once it’s fixed, this should automatically not be an issue anymore.
    * For the time being, if you end up running into issues and your app has to be rolled back to nomad, you might have to fly resume your app. This should fix the issue. Sorry about this!

Why not to do this (yet)

  • Your app can’t tolerate any amount of time in readonly mode
  • You are scared of new commands
  • The rest of the equivalent section from last week doesn’t super apply since pg on fly machines is a well tested specific experience

Why you absolutely should do this

  • Your postgres will become way more reliable running on appsv2 than on v1
  • All the reasons from lasts weeks post

Again, this is new software. There might always be bugs but we think migrating your postgres apps will make them way more reliable and be a much nicer experience. We’ve already migrated some dbs for our internal tooling which was really nice.

Happy migrating! Let us know how it goes, if you have any questions, or find any bugs.

10 Likes

I’m already using machines with postgres-ha (Stolon v0.0.34). Can this new feature help me move my production cluster to postgres-flex (repmgr)?

Not yet unfortunately. This is just for moving nomad stolon to machines stolon. We hope to have a migration path from stolon to flex soon.

It wasn’t mentioned in last week’s post, but do the new PG images include PostGIS?

1 Like

Wasn’t PostGIS added like a year ago?

Edit:

No, you’re right. Just checked my machines PG instance. It’s not there when doing select * from pg_extension;.

PostGIS has been available on standard (non-machines) Fly PG for ages, yes. I depend on it so I’ve been holding off on any migration.

1 Like

To be clear, this migrates from our stolon based image to our stolon based image. So nothing about what runs in your vm is changed. PostGIS will still be there. This tool is not (yet) migrating to flex.

1 Like

I feel like I saw somewhere that TimescaleDB was not supported on v2. Is that still the case?

V2 supports Timescale in the same way v1 does. Flex also supports timescale db. But again this migration tool just moves from v1 (stolon on nomad) to v2 (stolon on machines). Not to flex.

2 Likes

Everything seems fine after running the command, but we are stuck on Waiting for in region replicas to become healthy for at least 15 minutes now, we have even manually cloned the newly created machine so that there is a “replica” available. Not sure what to do.

Do I need to set a PRIMARY_REGION in secrets?

@DAlperin - is there a way to manually remove the old v1 instances and unlock the db for writes? We are stuck in limbo now.

Thanks,

Could you share the app name so I can take a look?

bridge-way-fb-capi-db

I’m looking into it.

1 Like

Thank you, please let me know if there is anything I can do to help.

1 Like

Ok I think I see whats happening here. Are you cool with me manually intervening to fix it for now? @danwetherald

Yes please.

@danwetherald could you run the following commands please?

flyctl m update 6e82dd03c73178 --image flyio/postgres@sha256:57374f9b8964304dc2bbf487eda20982fc0cfaa47a3ef2e36d320826e8d343d4
flyctl m update 6e82dd07f73d98 --image flyio/postgres@sha256:57374f9b8964304dc2bbf487eda20982fc0cfaa47a3ef2e36d320826e8d343d4

On it.

I believe you may have already destroyed that second machine 6e82dd07f73d98

I didn’t. You can go ahead and kill the flyctl migrate-to-v2 process if you haven’t already. I’m gonna clean this up.