Sunsetting Nomad

Hi! It’s been a minute since our last migration check-in.

Over the past few months, we’ve done a lot of talking about migrating everyone off of our old Nomad-based Apps platform and onto the newer Fly Launch platform, built on Machines. We’re nearing completion on that.

At the time I’m writing this, we’re a little bit over 96% migrated. As we sprint to finish as much of the last four percent as we can migrate from our side, we’ve also been looking towards the future (and what this means for the remaining Nomad apps).

We released the manual migration tool at the start of April, and by May, we’d started migrating simple apps automatically. We also pulled the plug on creating new Nomad apps in May.

It’s time to Get Off Nomad for good.

Over the next two months, we want to get rid of Nomad. Like, stop-provisioning-it-on-our-servers get rid of it. We’re going to be running more automated migrations, but anyone who still has Nomad apps should try to get them migrated by Oct 1st. (if this is you, you should be getting an email about this very soon!)

We’ve been working hard to ensure that existing V1 workflows transfer seamlessly to V2. If there’s something we missed or any unaddressed concerns that are blocking you from migrating, please let us know!

At risk of sounding like a broken record, here’s an explanation of the migration process:

  1. Open a terminal inside the directory with your app’s fly.toml file (or, if you don’t have a toml file, cd into an empty directory and run fly config save -a <app_name> to get one)!

  2. Just run fly migrate-to-v2!

Everything should be taken care of automatically, but if you run into any issues, feel free to reply with them in this thread for some troubleshooting help :slight_smile:

TLDR: Please migrate your apps by Oct 1st! We want to get rid of Nomad entirely within the next couple months.

10 Likes

I’ve tried the upgrade many times but always get:

Error: can't get role for fdaa:0:c50e:a7b:9d36:89c8:85ad:2: Get "http://fdaa:0:c50e:a7b:9d36:89c8:85ad:2:5500/commands/admin/role": connect tcp [fdaa:0:c50e:a7b:9d36:89c8:85ad:2]:5500: connection was refused.

Any idea why or how to get around it?

I get the same error.

@brycethornton, @julia: I believe this means your postgres image is out-of-date.

Best practice would be to make a backup of your database before a big upgrade. You could use our daily backups, but if your backup needs more granularity, you should be able to chain fly proxy <port> -a <app> with pg_dump -h localhost -p <port> -U postgres <db_name> > backup.sql to obtain a local backup.

Once that’s done (or not, if you choose to skip the backup), you can run fly image update -a <app_name> to update your postgres image.

After that, migrations should go smoothly.

1 Like

@allison I’ve tried that. For my “app-db” app it says " No changes to apply". For my main app (the nomad app) it says “Error: image is not eligible for automated image updates”.

I got an email an hour ago saying “you have an app that is still running on Nomad”, but when I run fly migrate-to-v2 as directed, I get Error: the app '[my app name]' is already on the apps v2 platform.

I’m guessing I should trust the command output over the email?

(I had an email July 27 saying my app would be migrated the following week.)

I’d double-check with fly status -a <appname>, but I’m triggering migration batches in between replies right now, so it most likely got caught in the latest wave :slight_smile:

2 Likes

fly status -a <appname> includes Platform = machines, so looks like I’m good.

4 Likes

Your app was being misidentified as a database somehow! We’ve changed that, you should be able to migrate now.

@allison That worked! :tada: I’ve been fighting with that for so long. Thanks!

1 Like

@allison the app I’m trying to upgrade (mess-with-dns) isn’t a database, so I’m confused about why I’d need to update the Postgres image. (it has a database, but it isn’t a database itself)

Ah, same issue. Should be fixed now!

1 Like

When I ran the migration, I got this error:

failed while migrating: smoke checks for 6e82d69df515d8 failed: the app appears to be crashing
? Would you like to enter interactive troubleshooting mode? If not, the migration will be rolled back. Yes

Oops! We ran into issues migrating your app.
We're constantly working to improve the migration and squash bugs, but for
now please let this troubleshooting wizard guide you down a yellow brick road
of potential solutions...
               ,,,,,
       ,,.,,,,,,,,, .
   .,,,,,,,
  ,,,,,,,,,.,,
     ,,,,,,,,,,,,,,,,,,,
         ,,,,,,,,,,,,,,,,,,,,
            ,,,,,,,,,,,,,,,,,,,,,
           ,,,,,,,,,,,,,,,,,,,,,,,
        ,,,,,,,,,,,,,,,,,,,,,,,,,,,,.
   , ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

The app's platform version is 'detached'
This means that the app is stuck in a half-migrated state, and wasn't able to
be fully recovered during the migration error rollback process.

Fixing this depends on how far the app got in the migration process.
Please use these tools to troubleshoot and attempt to repair the app.
No legacy Nomad VMs found. Setting platform version to machines/Apps V2.

I managed to get the app to stop crashing by giving it 2x more memory than it had before (still not sure what’s going on with that) but this error message is pretty confusing – what does “Please use these tools to troubleshoot and attempt to repair the app.” refer to?

I think my app has two instances running, maybe because it’s in a half-migrated state? Would love any guidance for how to . Here’s what I see in fly logs (indicated that there’s more than one instance running)

2023-09-15T18:13:57Z app[6e82d69df515d8] iad [info]Logged request
2023-09-15T18:13:57Z app[063b048b] iad [info][    5.260957] Out of memory: Killed process 317 (mess-with-dns) total-vm:1147424kB, anon-rss:188908kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:512kB oom_score_adj:0

fly machine list only lists one machine though (6e82d69df515d8) so I don’t know what 063b048b is or how to make it go away.

Typically when you end up there, you get dropped into a sort of “troubleshooting wizard” style interface. Rarely, it can just figure out what to do, though, and it’ll fix the app itself. It seems like it might’ve missed the mark, here, though.

The long ID (6e82..) is a Machine. The short one (063b..) is a Nomad alloc. Our admin panel shows that that alloc just shut down, so you should be good now. Sorry for the rocky start, but it looks okay now!

I still see the Nomad one running as of 5 seconds ago, it’s constantly crash looping (edit: but maybe not anymore?)

Ahh okay, thanks. I couldn’t see that on the backend.

You should be able to run fly apps set-platform-version detached; fly scale count 0" to scale the nomad alloc back down to zero. Then just run fly apps set-platform-version machines` to go back (you might have to wait a minute before this command will succeed, because it shouldn’t allow changing the version to machines with running allocs)

that worked, thank you! Looks like the scale was still set to 1 but now it’s 0.

I’m still seeing errors from the old Nomad instance in my Fly logs for some reason but I think I’ll wait 24 hours and see if the old Nomad instance eventually goes away.

Hi, I have a postgres DB that I use for a hobby project (nerves weather workstation) and I get this error when I try to migrate to V2:

➜  fly migrate-to-v2
Error: can't get role for 1234:0:1234:a7b:1234:7:1234:2: invalid character 'p' looking for beginning of value

(exact value changed for security reasons)

Rings any bell?

I have a problem when trying the upgrade. When running fly migrate-to-v2 I get the following error: can't get role for fdaa:0:eb6a:a7b:67:f19e:32cc:2: Get "http://fdaa:0:eb6a:a7b:67:f19e:32cc:2:5500/commands/admin/role": connect tcp [fdaa:0:eb6a:a7b:67:f19e:32cc:2]:5500: connection was refused

When I run fly image update -a floatify, I get the error: image is not eligible for automated image updates

Any tips that could help?