Sunsetting Nomad

woylie · October 10, 2023, 4:47am

I tried that. fly migrate-to-v2 displayed this at some point:

INFO Using wait timeout: 5m0s lease timeout: 13s delay between lease refreshes: 4s

failed while migrating: Process group 'app' needs volumes with name 'pg_data_machines' to fullfill mounts defined in fly.toml; Run `fly volume create pg_data_machines -r REGION` for the following regions and counts: nrt=1

So I ran the mentioned command and then ran fly migrate-to-v2 again, which ended with:

==> Migrating ...-db to the V2 platform
>  Upgrading postgres image
>  Setting postgres primary to readonly
>  Creating new postgres volumes
>  Locking app to prevent changes during the migration
>  Enabling machine creation on app
>  Creating an app release to register this migration
>  Starting machines
INFO Using wait timeout: 5m0s lease timeout: 13s delay between lease refreshes: 4s

Updating existing machines in '...-db' with rolling strategy

-------
 ⠋ Waiting for 1234567890 [app] to become healthy: 1/3
-------
failed while migrating: timeout reached waiting for healthchecks to pass for machine 1234567890 failed to get VM 1234567890: Get "https://api.machines.dev/v1/apps/...-db/machines/1234567890": net/http: request canceled
? Would you like to enter interactive troubleshooting mode? If not, the migration will be rolled back. (Y/n)

Hitting Y returns:

Oops! We ran into issues migrating your app.
We're constantly working to improve the migration and squash bugs, but for
now please let this troubleshooting wizard guide you down a yellow brick road
of potential solutions...
               ,,,,,
       ,,.,,,,,,,,, .
   .,,,,,,,
  ,,,,,,,,,.,,
     ,,,,,,,,,,,,,,,,,,,
         ,,,,,,,,,,,,,,,,,,,,
            ,,,,,,,,,,,,,,,,,,,,,
           ,,,,,,,,,,,,,,,,,,,,,,,
        ,,,,,,,,,,,,,,,,,,,,,,,,,,,,.
   , ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

The app's platform version is 'detached'
This means that the app is stuck in a half-migrated state, and wasn't able to
be fully recovered during the migration error rollback process.

Fixing this depends on how far the app got in the migration process.
Please use these tools to troubleshoot and attempt to repair the app.

fabiandittrich · October 10, 2023, 10:27am

when I run fly migrate-to-v2 I get timeout error and when I choose interactive section I get the options shown in the screenshot. I am confused which option to choose so my data is not lost. any help will be appreciated. Thanks

allison · October 10, 2023, 8:07pm

Hi! I looked up your app in the backend. It seems like the issue was just network trouble - your health checks for the migrated machines are passing and it looks like everything is running.

You can run fly migrate-to-v2 troubleshoot to return to that screen if you’ve since closed flyctl, and you’ll want to select “Destroy remaining Nomad VMs and use Apps V2.” (for what it’s worth, even if something were to go wrong, you’d still be able to recover the old VMs, so don’t worry about losing anything)

allison · October 10, 2023, 8:09pm

woylie:

failed while migrating: Process group 'app' needs volumes with name 'pg_data_machines' to fullfill mounts defined in fly.toml; Run `fly volume create pg_data_machines -r REGION` for the following regions and counts: nrt=1

I’m so sorry for the confusion here - that definitely should not be printed during a migration, and its suggestion is incorrect. (internally migrate-to-v2 calls fly deploy, which is where that error comes from, but it’s not applicable to a migration. definitely a bug)

To repair things, you should run fly migrate-to-v2 troubleshoot to get back to that troubleshooting wizard, and choose the option that says something along the lines of “destroy existing Machines and use Nomad” Additionally, you should delete the volume you created - it’ll be the one with the suffix _machines.

I’m going to check and see what went wrong here, and what we can do to make that migration work. In the meantime, those instructions should get you back on Nomad and running stable.

woylie · October 12, 2023, 10:09am

Alright, done that.

miguel-s · October 12, 2023, 1:17pm

Hi @allison
Did you have time to look at the debug logs?

I tried a few other solutions from this thread but always end up with the same Error: 404: 404 page not found.

allison · October 12, 2023, 6:16pm

Ok, that bug should be fixed now! Sorry for the trouble.

Before migrating, run fly version show to double-check that you’re on flyctl v0.1.108, then migrate-to-v2 should work on your database. (it might take a day or so for that to hit Homebrew, if that’s your package manager of choice)

allison · October 12, 2023, 6:22pm

Hi! I haven’t been able to determine the root cause of this.

In the meantime, if a little bit of downtime is acceptable, you can try:

fly migrate-to-v2 --force-standard-migration

This flag sidesteps all the smart postgres-specific migration code that keeps your db online during the migration, but in exchange there are a lot fewer moving parts. (including the specific request that’s failing in those debug logs)

miguel-s · October 12, 2023, 6:43pm

It worked! Thanks for the support @allison

statusvista · October 12, 2023, 9:18pm

@allison I’m getting the following error running fly migrate-to-v2

DEBUG gqlErr: <nil> agentErr: <nil>

DEBUG flypg will connect to: http://fdda:...:3:5500

DEBUG --> GET http://fdda:...:3:5500/commands/admin/role

DEBUG <-- 500 http://fdda:...:3:5500/commands/admin/role (5.15s)

DEBUG {
  "error": "context deadline exceeded"
}


DEBUG Task manager done
Error: can't get role for fdda:...:3: 500: context deadline exceeded

Any ideas for things to try?

westbrookc16 · October 12, 2023, 11:51pm

I am getting an error when trying to migrate my postgres app to v2. It says it can’t create the volume, it returns a status of 503. Not sure what to do.

woylie · October 13, 2023, 4:52am

Thanks, it worked now!

allison · October 13, 2023, 6:52pm

So, that IPV6 address is the IP of the leader node (via VPN into your org’s network) in that Postgres cluster. The endpoints on the node should definitely not be returning 500 errors.

At the same time, there’s a time for us to sit down and figure out what’s causing that, and that time is not 18 days before Nomad gets removed haha. I think, right now, you should double-check that your database is OK. fly pg connect and look around, just make sure things are still working.

If it all looks good, I’d run fly migrate-to-v2 --force-standard-migration. That will cause a couple minutes of downtime, but it’ll get you moved over so you don’t have to worry about any deadlines.

If the downtime won’t work for you, we can look at other options, but I’m inclined to say the simplest option is safest if it won’t cause any significant issues.

allison · October 13, 2023, 6:53pm

Can you try running that again? We’ve had some momentary flickers in the past day or so with volume creation - it might have resolved itself in time

statusvista · October 13, 2023, 7:11pm

The same error is returned trying to run fly pg connect. I re-ran flyctl auth login to make sure I recently authenticated.

Error: can't get role for fdda:...:3: 500: context deadline exceeded

westbrookc16 · October 14, 2023, 6:10pm

I tried again and am still getting the same error. I am in region ewr and I saw a thread about having trouble creating volumes in that region, could that be the problem?

dronda · October 14, 2023, 11:24pm

Hi @allison . Is there any update on this by any chance? I am still getting emails saying that time is running out.

I have since retried the command and now I get

Error: 404: 404 page not found

I tried the --force-standard-migration and that errored with the following:

Making snapshots of volumes for the new machines
failed while migrating: failed to create volume: request returned non-2xx status, 503
? Would you like to enter interactive troubleshooting mode? If not, the migration will be rolled back. Yes
failed while troubleshooting: failed to create volume: request returned non-2xx status, 503
Error: failed to create volume: request returned non-2xx status, 503 (Request ID: 01HCR7ZRJHM0C17Z8TM8XFRJN3-lga)

Topic		Replies	Views
Get In Losers*, We're Getting Off Nomad Fresh Produce appsv2	44	13306	June 2, 2023
The Death Of Nomad Fresh Produce	4	1263	November 6, 2023
Beta-Testers Wanted: Migrating Apps With Autoscaling Fresh Produce help-me-help-you	0	415	May 26, 2023
fly migrate-to-v2 troubleshoot: self-service troubleshooting for migration issues Fresh Produce appsv2	1	900	July 1, 2023
fly migrate-to-v2 - Automatic migration to Apps V2 Fresh Produce appsv2 , machines	62	13025	July 29, 2023

Sunsetting Nomad

Related topics