Get In Losers*, We're Getting Off Nomad

@allison - it appears I have apps that have been automatically migrated to V2 overnight. However, I suspect one of them has failed the migration and is now in some sort of strange state (re: Fly’s back-end).

  1. As far as I can tell (is this a pre-req to/part of the migration process?) the scale count has been reduced to 1, it was previously 2.
  2. I noticed it was reported as suspended, so I restarted it. No dice - still suspended, although it appears it is actually up and working.
  3. I’m not sure if suspended is a valid state for Fly-Nomad app status (is the Nomad equivalent dead)? The platform for the app is still reported as being nomad.
  4. Logs for the app appear to have stopped since the (I assume) failed automated migration.

Though I appreciate the business and reputational imperative in force-migrating to people to Apps V2, in the interests of transparency I think it would be useful if Fly could create a new post documenting known feature shortfalls between the two (example from this thread: “bluegreen strategy, with health checks”).

Another example (TBC): with Nomad I believe it was possible, albeit subject to Fly-Nomad deficiencies (potential 15 minute delay/etc), to have a single always(read:mostly)-available instance. In the event of hardware failure Nomad would automatically move the VM to another host(?). I understand that machine instances are tied to hardware - I’m not sure if Apps V2 supports this use case? It would need two machines - but with Fly’s in-house orchestration ensuring only one is running at any point in time, and without reliance on external connections to trigger the proxy to bring up the 2nd/backup machine.

I’ll refrain from speculating too much, but it does sound like we tried to auto migrate your app and it failed.

For a little context, “suspended” is real old terminology from when you could suspend/resume nomad apps, and when machines was first built the “suspended” flag was overloaded to mean a machines app with no machines. During migration, at some point the app had no machines, so that flag got set. It just never got unset when things failed and it tried to restore your app to the previous state. You should be able to fly resume <appname> to get it back in working order.

We’ve seen a few people mention logs being strange after migration attempts. We’re looking into it.

Bluegreen is sadly not supported right now. It’s being looked into, but honestly, most of the people working on the apps platform are working on making sure this migration goes well right now. In the meantime, we do support canary deployments, which are pretty close!

As for having a single instance of an app that is relatively resilient, you might be looking for standby machines? Essentially, these are machines that are pointed at another machine, and turn on when their target machine is unreachable. Two caveats here, though: I don’t know what happens when/if the original machine comes back up, and they only get added for processes that do not expose a service.

I already asked about this before, but never got a reply. How do I get rid of this notification?

I neither want or need more than 1 instance.

And how do I scale down to 0 and back to 1 machine?

I simply run flyctl scale count 0 and then flyctl scale count 1, which immediately fails. With this message:

Error: there are no active machines for this app. Run fly deploy to create one and rerun this command

I tried runnin flyctl deploy this a couple of times, but every time I lose my machine settings, e.g. I have a machine with 512MB of RAM, scale down to 0, deploy, and it’s restarted on a machine with 256MB of RAM, which is not enough for startup and gets stuck until I scale the RAM back up manually.

There is a fix for this in the pipeline which should be available in the next flyctl release.

2 Likes

Thanks, one more thing: when deploying from scratch the tool automatically creates two instances:

This used to start just one instance, why did it change, and why is it the default now? How do I change it back?

You can kill one of the machines with fly m destroy and the next fly deploy will respect the machine count.

If you need to scale down to 0 and then scale up, run fly deploy --ha=false , that will launch only one machine if there aren’t any.

At last, we recommend to run 2 machines but enable autostart/autostop to reduce costs while keeping availability.

See Automatically starting/stopping Apps v2 instances and Setting a minimum number of instances to keep running when using auto start/stop

Hmmm, I kinda feel like I’ve lost a bit though :sweat_smile:. I’ve been automatically upgraded, and now my deploys don’t work and it has taken my site down.

I’m getting a timeout Error for health checks. I’m running a Rails 7 app.

Just looking through the logs and this seems to be an issue with ActiveSupport.

Does anyone have any suggestions on how I could fix this?

=> Booting Puma
2023-05-25T02:58:55.100 app[9080526b60ee87] sin [info] => Rails 7.0.4 application starting in production
2023-05-25T02:58:55.100 app[9080526b60ee87] sin [info] => Run `bin/rails server --help` for more startup options
2023-05-25T02:58:55.135 app[9080526b60ee87] sin [info] Exiting
2023-05-25T02:58:55.135 app[9080526b60ee87] sin [info] /app/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4/lib/active_support/message_encryptor.rb:209:in `rescue in _decrypt': ActiveSupport::MessageEncryptor::InvalidMessage (ActiveSupport::MessageEncryptor::InvalidMessage)
2023-05-25T02:58:55.135 app[9080526b60ee87] sin [info] from /app/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4/lib/active_support/message_encryptor.rb:186:in `_decrypt'
2023-05-25T02:58:55.135 app[9080526b60ee87] sin [info] from /app/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4/lib/active_support/message_encryptor.rb:160:in `decrypt_and_verify'
2023-05-25T02:58:55.135 app[9080526b60ee87] sin [info] from /app/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4/lib/active_support/messages/rotator.rb:22:in `decrypt_and_verify'
2023-05-25T02:58:55.135 app[9080526b60ee87] sin [info] from /app/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4/lib/active_support/encrypted_file.rb:104:in `decrypt'
2023-05-25T02:58:55.135 app[9080526b60ee87] sin [info] from /app/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4/lib/active_support/encrypted_file.rb:66:in `read'
2023-05-25T02:58:55.135 app[9080526b60ee87] sin [info] from /app/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4/lib/active_support/encrypted_configuration.rb:21:in `read'
2023-05-25T02:58:55.135 app[9080526b60ee87] sin [info] from /app/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4/lib/active_support/encrypted_configuration.rb:33:in `config'
2023-05-25T02:58:55.135 app[9080526b60ee87] sin [info] from /app/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4/lib/active_support/encrypted_configuration.rb:48:in `options'
2023-05-25T02:58:55.135 app[9080526b60ee87] sin [info] from /app/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4/lib/active_support/core_ext/module/delegation.rb:303:in `method_missing'

@judel You’re probably deploying with a v1 fly.toml. If that’s the problem, you can run fly config save to pull in a current, v2-compatible fly.toml and then try deploying again.

Ran fly config save, and tried to deploy again, and I’m getting a similar error, but it’s now related to sidekiq.

This is my .toml file.

# fly.toml app configuration file generated for trade on 2023-05-25T08:19:37+04:00
#
# See https://fly.io/docs/reference/configuration/ for information about how to use this file.
#

app = "trade"
kill_signal = "SIGINT"
kill_timeout = "5s"

[experimental]
  auto_rollback = true

[build]
  [build.args]
    BUILD_COMMAND = "bin/rails fly:build"
    SERVER_COMMAND = "bin/rails fly:server"

[processes]
  web = "bin/rails server"
  worker = "bundle exec sidekiq"

[[services]]
  protocol = "tcp"
  internal_port = 8080
  processes = ["web"]

  [[services.ports]]
    port = 80
    handlers = ["http"]
    force_https = true

  [[services.ports]]
    port = 443
    handlers = ["tls", "http"]
  [services.concurrency]
    type = "connections"
    hard_limit = 25
    soft_limit = 20

  [[services.tcp_checks]]
    interval = "15s"
    timeout = "2s"
    grace_period = "1s"
    restart_limit = 0

[[statics]]
  guest_path = "/app/public"
  url_prefix = "/"

It may be that your RAILS_MASTER_KEY secret is not set. If you don’t see it in fly secrets list, can you try setting it again?

Both of the secrets I set before are still there.
RAILS_MASTER_KEY
REDIS_URL

This is what I’m getting in the logs at the moment.

Updating existing machines in 'trade' with rolling strategy
  Machine 3287dd0a642585 [worker] has state: started
  [1/2] Checking that 3287dd0a642585 [worker] is up and running
Smoke checks for 3287dd0a642585 failed: the app appears to be crashing
Check its logs: here's the last lines below, or run 'fly logs -i 3287dd0a642585':
  /app/vendor/bundle/ruby/3.1.0/gems/sidekiq-7.0.3/lib/sidekiq/cli.rb:302:in `require'
  /app/vendor/bundle/ruby/3.1.0/gems/sidekiq-7.0.3/lib/sidekiq/cli.rb:302:in `boot_application'
  /app/vendor/bundle/ruby/3.1.0/gems/sidekiq-7.0.3/lib/sidekiq/cli.rb:42:in `run'
  /app/vendor/bundle/ruby/3.1.0/gems/sidekiq-7.0.3/bin/sidekiq:31:in `<top (required)>'
  /app/vendor/bundle/ruby/3.1.0/bin/sidekiq:25:in `load'
  /app/vendor/bundle/ruby/3.1.0/bin/sidekiq:25:in `<top (required)>'
  /usr/local/bundle/gems/bundler-2.3.7/lib/bundler/cli/exec.rb:58:in `load'
  /usr/local/bundle/gems/bundler-2.3.7/lib/bundler/cli/exec.rb:58:in `kernel_load'
  /usr/local/bundle/gems/bundler-2.3.7/lib/bundler/cli/exec.rb:23:in `run'
  /usr/local/bundle/gems/bundler-2.3.7/lib/bundler/cli.rb:484:in `exec'
  /usr/local/bundle/gems/bundler-2.3.7/lib/bundler/vendor/thor/lib/thor/command.rb:27:in `run'
  /usr/local/bundle/gems/bundler-2.3.7/lib/bundler/vendor/thor/lib/thor/invocation.rb:127:in `invoke_command'
  /usr/local/bundle/gems/bundler-2.3.7/lib/bundler/vendor/thor/lib/thor.rb:392:in `dispatch'
  /usr/local/bundle/gems/bundler-2.3.7/lib/bundler/cli.rb:31:in `dispatch'
  /usr/local/bundle/gems/bundler-2.3.7/lib/bundler/vendor/thor/lib/thor/base.rb:485:in `start'
  /usr/local/bundle/gems/bundler-2.3.7/lib/bundler/cli.rb:25:in `start'
  /usr/local/bundle/gems/bundler-2.3.7/exe/bundle:48:in `block in <top (required)>'
  /usr/local/bundle/gems/bundler-2.3.7/lib/bundler/friendly_errors.rb:103:in `with_friendly_errors'
  /usr/local/bundle/gems/bundler-2.3.7/exe/bundle:36:in `<top (required)>'
  /usr/local/bundle/bin/bundle:25:in `load'
  /usr/local/bundle/bin/bundle:25:in `<main>'
  Starting clean up.
  hallpass exited, pid: 514, status: signal: 15
  2023/05/25 04:23:19 listening on [fdaa:1:e40:a7b:81:6ead:f927:2]:22 (DNS: [fdaa::3]:53)
  [    4.159064] reboot: Restarting system
  machine did not have a restart policy, defaulting to restart
  Starting init (commit: 9bb7ee8)...
  Preparing to run: `bundle exec sidekiq` as root
  2023/05/25 04:23:21 listening on [fdaa:1:e40:a7b:81:6ead:f927:2]:22 (DNS: [fdaa::3]:53)
  ActiveSupport::MessageEncryptor::InvalidMessage
  /app/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4/lib/active_support/message_encryptor.rb:209:in `rescue in _decrypt'
  /app/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4/lib/active_support/message_encryptor.rb:186:in `_decrypt'
  /app/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4/lib/active_support/message_encryptor.rb:160:in `decrypt_and_verify'
  /app/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4/lib/active_support/messages/rotator.rb:22:in `decrypt_and_verify'
  /app/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4/lib/active_support/encrypted_file.rb:104:in `decrypt'
  /app/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4/lib/active_support/encrypted_file.rb:66:in `read'
  /app/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4/lib/active_support/encrypted_configuration.rb:21:in `read'
  /app/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4/lib/active_support/encrypted_configuration.rb:33:in `config'
  /app/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4/lib/active_support/encrypted_configuration.rb:48:in `options'
  /app/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.4/lib/active_support/core_ext/module/delegation.rb:303:in `method_missing'
  /app/config/environments/production.rb:74:in `block in <main>'
  /app/vendor/bundle/ruby/3.1.0/gems/railties-7.0.4/lib/rails/railtie.rb:257:in `instance_eval'
  /app/vendor/bundle/ruby/3.1.0/gems/railties-7.0.4/lib/rails/railtie.rb:257:in `configure'
  /app/config/environments/production.rb:3:in `<main>'
  /app/vendor/bundle/ruby/3.1.0/gems/bootsnap-1.15.0/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:32:in `require'
  /app/vendor/bundle/ruby/3.1.0/gems/bootsnap-1.15.0/lib/bootsnap/load_path_cache/core_ext/kernel_require.rb:32:in `require'
  /app/vendor/bundle/ruby/3.1.0/gems/zeitwerk-2.6.7/lib/zeitwerk/kernel.rb:38:in `require'
  /app/vendor/bundle/ruby/3.1.0/gems/railties-7.0.4/lib/rails/engine.rb:562:in `block (2 levels) in <class:Engine>'
  /app/vendor/bundle/ruby/3.1.0/gems/railties-7.0.4/lib/rails/engine.rb:561:in `each'
  /app/vendor/bundle/ruby/3.1.0/gems/railties-7.0.4/lib/rails/engine.rb:561:in `block in <class:Engine>'
  /app/vendor/bundle/ruby/3.1.0/gems/railties-7.0.4/lib/rails/initializable.rb:32:in `instance_exec'
  /app/vendor/bundle/ruby/3.1.0/gems/railties-7.0.4/lib/rails/initializable.rb:32:in `run'
  /app/vendor/bundle/ruby/3.1.0/gems/railties-7.0.4/lib/rails/initializable.rb:61:in `block in run_initializers'
  /usr/lib/fullstaq-ruby/versions/3.1.2-jemalloc/lib/ruby/3.1.0/tsort.rb:228:in `block in tsort_each'
  /usr/lib/fullstaq-ruby/versions/3.1.2-jemalloc/lib/ruby/3.1.0/tsort.rb:350:in `block (2 levels) in each_strongly_connected_component'
  /usr/lib/fullstaq-ruby/versions/3.1.2-jemalloc/lib/ruby/3.1.0/tsort.rb:422:in `block (2 levels) in each_strongly_connected_component_from'
  /usr/lib/fullstaq-ruby/versions/3.1.2-jemalloc/lib/ruby/3.1.0/tsort.rb:431:in `each_strongly_connected_component_from'
  /usr/lib/fullstaq-ruby/versions/3.1.2-jemalloc/lib/ruby/3.1.0/tsort.rb:421:in `block in each_strongly_connected_component_from'
  /app/vendor/bundle/ruby/3.1.0/gems/railties-7.0.4/lib/rails/initializable.rb:50:in `each'
  /app/vendor/bundle/ruby/3.1.0/gems/railties-7.0.4/lib/rails/initializable.rb:50:in `tsort_each_child'
  /usr/lib/fullstaq-ruby/versions/3.1.2-jemalloc/lib/ruby/3.1.0/tsort.rb:415:in `call'
  /usr/lib/fullstaq-ruby/versions/3.1.2-jemalloc/lib/ruby/3.1.0/tsort.rb:415:in `each_strongly_connected_component_from'
  /usr/lib/fullstaq-ruby/versions/3.1.2-jemalloc/lib/ruby/3.1.0/tsort.rb:349:in `block in each_strongly_connected_component'
  /usr/lib/fullstaq-ruby/versions/3.1.2-jemalloc/lib/ruby/3.1.0/tsort.rb:347:in `each'
  /usr/lib/fullstaq-ruby/versions/3.1.2-jemalloc/lib/ruby/3.1.0/tsort.rb:347:in `call'
  /usr/lib/fullstaq-ruby/versions/3.1.2-jemalloc/lib/ruby/3.1.0/tsort.rb:347:in `each_strongly_connected_component'
  /usr/lib/fullstaq-ruby/versions/3.1.2-jemalloc/lib/ruby/3.1.0/tsort.rb:226:in `tsort_each'
  /usr/lib/fullstaq-ruby/versions/3.1.2-jemalloc/lib/ruby/3.1.0/tsort.rb:205:in `tsort_each'
  /app/vendor/bundle/ruby/3.1.0/gems/railties-7.0.4/lib/rails/initializable.rb:60:in `run_initializers'
  /app/vendor/bundle/ruby/3.1.0/gems/railties-7.0.4/lib/rails/application.rb:372:in `initialize!'
  /app/config/environment.rb:5:in `<top (required)>'
  /app/vendor/bundle/ruby/3.1.0/gems/sidekiq-7.0.3/lib/sidekiq/cli.rb:302:in `require'
  /app/vendor/bundle/ruby/3.1.0/gems/sidekiq-7.0.3/lib/sidekiq/cli.rb:302:in `boot_application'
  /app/vendor/bundle/ruby/3.1.0/gems/sidekiq-7.0.3/lib/sidekiq/cli.rb:42:in `run'
  /app/vendor/bundle/ruby/3.1.0/gems/sidekiq-7.0.3/bin/sidekiq:31:in `<top (required)>'
  /app/vendor/bundle/ruby/3.1.0/bin/sidekiq:25:in `load'
  /app/vendor/bundle/ruby/3.1.0/bin/sidekiq:25:in `<top (required)>'
  /usr/local/bundle/gems/bundler-2.3.7/lib/bundler/cli/exec.rb:58:in `load'
  /usr/local/bundle/gems/bundler-2.3.7/lib/bundler/cli/exec.rb:58:in `kernel_load'
  /usr/local/bundle/gems/bundler-2.3.7/lib/bundler/cli/exec.rb:23:in `run'
  /usr/local/bundle/gems/bundler-2.3.7/lib/bundler/cli.rb:484:in `exec'
  /usr/local/bundle/gems/bundler-2.3.7/lib/bundler/vendor/thor/lib/thor/command.rb:27:in `run'
  /usr/local/bundle/gems/bundler-2.3.7/lib/bundler/vendor/thor/lib/thor/invocation.rb:127:in `invoke_command'
  /usr/local/bundle/gems/bundler-2.3.7/lib/bundler/vendor/thor/lib/thor.rb:392:in `dispatch'
  /usr/local/bundle/gems/bundler-2.3.7/lib/bundler/cli.rb:31:in `dispatch'
  /usr/local/bundle/gems/bundler-2.3.7/lib/bundler/vendor/thor/lib/thor/base.rb:485:in `start'
  /usr/local/bundle/gems/bundler-2.3.7/lib/bundler/cli.rb:25:in `start'
  /usr/local/bundle/gems/bundler-2.3.7/exe/bundle:48:in `block in <top (required)>'
  /usr/local/bundle/gems/bundler-2.3.7/lib/bundler/friendly_errors.rb:103:in `with_friendly_errors'
  /usr/local/bundle/gems/bundler-2.3.7/exe/bundle:36:in `<top (required)>'
  /usr/local/bundle/bin/bundle:25:in `load'
  /usr/local/bundle/bin/bundle:25:in `<main>'
  Starting clean up.
  hallpass exited, pid: 514, status: signal: 15
  2023/05/25 04:23:24 listening on [fdaa:1:e40:a7b:81:6ead:f927:2]:22 (DNS: [fdaa::3]:53)
  [    4.139955] reboot: Restarting system
  machine did not have a restart policy, defaulting to restart
  Starting init (commit: 9bb7ee8)...
  Preparing to run: `bundle exec sidekiq` as root
  2023/05/25 04:23:25 listening on [fdaa:1:e40:a7b:81:6ead:f927:2]:22 (DNS: [fdaa::3]:53)
Error: smoke checks for 3287dd0a642585 failed: the app appears to be crashing

There’s a known bug where an older copy of your RAILS_MASTER_KEY may have been set during migration. Can you try setting it again like: fly secrets set RAILS_MASTER_KEY=$(cat config/master.key)? This will ensure the value is the correct one.

2 Likes

Not to be funny, but perhaps a smaller number of apps have been included in the migration.

Also, I would have appreciated more emails warning of this, as if there was one I missed it.

1 Like

Resetting the Rails MASTERKEY has worked Thank you! I always struggle with this type of stuff (devops/infrastructure). Where could I have found out about this known bug, or any others?

The only way we noticed that some env secrets were not correct was we had offsite backups and compared them.

1 Like

I also stopped receiving logs on my v2 app. But I think this is a broader issue. My colleague ran into an issue where logs stopped coming in on a v1 app.

1 Like

Hopefully you can keep the documentation up to date. It still says that V2 apps do not support the canary strategy in several parts of the documentation. Will it be possible to make that the default deployment strategy? Just like they were in the v1 apps

So, as somebody not reading all forum posts I was a little bit surprised by this migration. Apparently, 15-20 of our apps have been migrated over night. Since I did not hear any complaints, yet, I guess it worked fine.

So I went ahead to update our .toml files by running fly config save and I wanted to ask, if it is a known issue that [build.args] are not included in the saved file?

3 Likes

It’s been known that [build] is not stored remotely, but that behavior is particularly bad when paired with automatic migrations.
The latest release of flyctl offers to merge in the [build] section from an existing config when running config save, so that this doesn’t break things for anyone else. Thanks for bringing this to our attention!

Also wanted to share here few notes:

  • If you have experimental / enable_consul = true on apps v1 fly.toml, migration goes well but you have to run flyctl consul attach to maintain similar environment
  • On automatic migrations one thing which is probably just our own problem, but we have wrote few comments here and there and config save does not have those comments anymore. Just needs little manual work to merge those.
  • Also, if previously fly.toml had empty [services], it seems that migrate-to-v2 puts some default 8080 service there.

Apart from weird things happening with secrets, we are happy with v1->v2 migration for now.

2 Likes