fly migrate-to-v2: Apps with volumes support šŸŽ‰

Only way I got past this is by doing at fly scale count 1. It seems the old autoscale commands do nothing.

1 Like

Hi! This was a bug on the backend, and should be fixed now. Sorry about that!

Is there any guide on how to migrate a nomad database to v2?
My database instance does not have a fly.toml, this instance was generated when I created my application.

@allison can you help me?

@rodolfosilva You can pull a fly.toml file from a running app with fly config save. That should be enough to get you migrating :slight_smile:

Iā€™m trying to migrate my app to the v2 platform using migrate-to-v2.

It seems to have got stuck at

Waiting for nomad allocs for '<app-name>' to be destroyed

It just hangs forever. Any suggestions? The logs arenā€™t giving me much

Update:

It eventually timed out and rolled back:

==> Migrating matchhaus-prod to the V2 platform
>  Locking app to prevent changes during the migration
>  Enabling machine creation on app
>  Creating an app release to register this migration
>  Starting machines
INFO Using wait timeout: 2m0s lease timeout: 13s delay between lease refreshes: 4s
Updating existing machines in 'matchhaus-prod' with rolling strategy
  Finished deploying
>  Scaling nomad VMs down to zero now that machines are running.
Waiting for nomad allocs for 'matchhaus-prod' to be destroyed /
Waiting for nomad allocs for 'matchhaus-prod' to be destroyed -
Waiting for nomad allocs for 'matchhaus-prod' to be destroyed |
failed while migrating: nomad allocs never reached zero, timed out
==> (!) An error has occurred. Attempting to rollback changes...
>  Setting platform version to 'nomad'
>  Successfully recovered

Now, when I attempt to migrate again it errors saying it cannot parse the config. The config has not changed since the last attempt and it doesnā€™t give any information as to why the config canā€™t be parsed.

$ fly migrate-to-v2 -c fly/fly.prod.toml                                                                                                                                                                   [15:38:22]
Error: We had trouble parsing your fly.toml. Please check https://fly.io/docs/reference/configuration for more information.

Iā€™m trying to migrate redis apps
flyctl migrate-to-v2 --config ./fly/fly.dev.redis.toml

or

flyctl migrate-to-v2 --config ./fly/fly.staging.redis.toml
and got the message

failed while migrating: unfortunately the worker hosting your volume vol_18l524yj5oj47zmp (redis_server_foodbank_dev) does not have capacity for another volume to support the migration; some other options: 1) try again later and there might be more space on the worker, 2) run a manual migration Manual migration to Apps V2, or 3) wait until we support volume migrations across workers (weā€™re working on it!)

Hi! Would you feel comfortable sharing your fly.toml? If not publicly, could you email support@fly.io? Iā€™d like to take a look at why this is.

1 Like

I actually got it working in the end. Seems like there was a problem on Flyā€™s end with the fly.toml as after the failed migration even running fly config save resulted in the ā€œcouldnā€™t parse configā€ error.

I had to do a fly deploy which seemed to fix the config. I then tryed migrate-to-v2 once more and this time it worked. The logs didnā€™t tell me much but the app is matchhaus-prod if you wanted to look further.

FWIW the app was crashing earlier today (after months of being super stable ) which I solved with another redeploy. Not sure if thereā€™s been some other Fly issues today that havenā€™t surfaced.

Yep youā€™re right, this was a bug on our end that affected some configs that @kwaw fixed earlier

1 Like
# fly.toml file generated for fly-foodbank-dev-redis on 2021-09-16T13:16:30+04:00

app = "fly-foodbank-dev-redis"

kill_signal = "SIGINT"
kill_timeout = 5
processes = []

[env]

[experimental]
  allowed_public_ports = []
  auto_rollback = true

[[mounts]]
  source      = "redis_server_foodbank_dev"
  destination = "/data"

[[services]]
  http_checks = []
  internal_port = 6379
  processes = ["app"]
  protocol = "tcp"
  script_checks = []

  [services.concurrency]
    hard_limit = 25
    soft_limit = 20
    type = "connections"

 [[services.ports]]
    handlers = []
    port = "10000"

  [[services.tcp_checks]]
    grace_period = "1s"
    interval = "15s"
    restart_limit = 6

We have lower than usual capacity in the region your app is deployed in. Iā€™m gonna copy @senyoā€™s great response to someone about this earlier this month:

One way around this is to create a volume in the same region and then do all the migration steps manually.
Otherwise, if youā€™re not a rush, you can retry the migration periodically. The worker may have capacity at a point in future and weā€™ll be able to run the migration successfully.
For what its worth, this issue is unique to the way weā€™ve designed volumes in that they are tied to a host, hence the issue. Weā€™ve got ideas to make this work out of the box so in future, this shouldnā€™t be a problem.

We are still not able to migrate any volume apps in ORD. Any updates?

To migrate when your current host is full, youā€™ll need remote volume forking.
Itā€™s coming along pretty well! We have cross-host volume forking working, but we still have to move state information through our stack so that flyctl can know when the destination volume is fully hydrated (and therefore safe to use). When we have volume forking ready, thatā€™ll get its own Fresh Produce thread, but one of us will also post about it in here too.

It shouldnā€™t be that far out :slight_smile:

I just migrated my app to v2, and it went incredibly smoothly. Itā€™s quite a simple app, but still, good job on making the migration process so straight forward guys! :clap:

1 Like

@allison - any update on the remote volume forking?

Thanks!

We have part of it shipped, but weā€™ve been intentionally quiet about it because thereā€™s still an important piece missing. We have a hidden flag --remote-fork for fly vol fork which should (in theory) work, but we still havenā€™t finished getting volume status exposed properly. That part is what tells us when the copy is finished, or when the volume is safe to mount and use. (which means using that flag right now is very at-your-own-risk - be careful!)

We should have this properly ready soon! Once we can fork and maintain safety, the feature will truly be announced (including being hooked into migrate-to-v2 and all that) :slight_smile:

1 Like

Thanks, we will stand by for the final version.

I have migrated my app successfully and I has been working well for 4 days now.
When I run fly vol list I get two volumes, one has the -machines suffix.

After youā€™ve migrated your app, and verified that your app works, you can safely delete the old volumes without the -machines suffix.

When I try to destroy the old volume. I get the following warning:

Warning! Individual volumes are pinned to individual hosts. You should create two or more volumes per application. Deleting this volume will leave you with 0 volume(s) for this application, and it is not reversible.  Learn more at https://fly.io/docs/reference/volumes/

It says 0 volumes yet the one with the _machines suffix should remain. I am afraid I will lose my data if I press yes and I am racking up charges for having two volumes.
How do I ensure that the volume with _machines suffix remains and I donā€™t lose data?

I wasnā€™t able to migrate a mini-app with volume:

ID                  	STATE  	NAME	SIZE	REGION	ZONE	ENCRYPTED	ATTACHED VM	CREATED AT
vol_jlgz1vpzl0e478m3	created	perm	10GB	ams   	8aba	true     	abc845a1   	1 year ago

Error: unfortunately the worker hosting your volume vol_jlgz1vpzl0e478m3 (perm) does not have capacity for another volume to support the migration; some other options: 1) try again later and there might be more space on the worker, 2) run a manual migration Manual migration to Apps V2, or 3) wait until we support volume migrations across workers (weā€™re working on it!)

Seems Iā€™ll just have to wait before moving to v2?

I just attempt to migrate, but got the error Error: failed to create volume: size_gb must be between 1GB and 500GB, despite the fact that my volumes are 50GB each (two volumes).

Iā€™m considering ditching this cluster and just re-create a new one and dump/restore data. Is that my best way to go?