worker machine is stopped after every deploy

Hello,

I have 2 Rails projects that are very similar in terms of setup: 1 app + 1 worker (running good_job).

The first project works perfectly fine, but the second project has this issue that every time I deploy, the worker machine stays in “stopped” status.

This is the Toml file (both projects have the same file):

app = "my-second-app"
primary_region = "lhr"
console_command = "/rails/bin/rails console"

[processes]
  app = "bin/rails server"
  worker = "bundle exec good_job start"

[deploy]
  release_command = "./bin/rails db:prepare"

[http_service]
  internal_port = 3000
  force_https = true
  auto_stop_machines = false
  auto_start_machines = true
  min_machines_running = 1
  processes = ["app"]

[[vm]]
  size = "shared-cpu-1x"
  memory = "512mb"
  cpus = 2
  cpu_kind = "shared"
  processes = ["app"]

[[vm]]
  size = "shared-cpu-1x"
  memory = "1gb"
  cpus = 2
  cpu_kind = "shared"
  processes = ["worker"]

[[statics]]
  guest_path = "/rails/public"
  url_prefix = "/"

Any help will be super appreciated!

Hi there!

What’s happening here is that your worker machine is designated as a “standby” for another machine. The standby will stay stopped unless the primary machine goes down; at that point the standby starts up and takes over, so that your service can keep operating.

Under this behavior, when you deploy an update, the primary machine starts up, and the standby is forced to a shutdown state so it can be on standby. The standby can be started manually, however. This explains the behavior you’re seeing.

The reason your worker is a standby, usually, is because when you created the app you did so in a high-availability configuration which provisions two machines per process group. For http services, both machines can be up and serving requests; for machines with no services (as explained in the doc I linked) one machine is the primary and the other will be a standby, as I explained above. And then, very likely, you destroyed one of the machines in each process group; and happened to destroy the worker primary, leaving only the standby around. In the complete absence of a primary (as opposed to an existing but stopped primary) the standby will behave the way you observed.

Thanks for reading through this very long explanation. The solution is to mark your worker as not-a-standby so it starts properly after a deployment, since it will know now it’s the only machine in that process group:

fly machines update YOUR_WORKER_MACHINE_ID --standby-for '' # This is an empty string with single quotes

Let me know if this works.

  • Daniel

Thank you so much for your help. Daniel.

That was exactly what I did, I started the projects with 2 app and 2 worker machines and I down sized them to 1 and 1.

So, just to make sure I understood properly, in the first project (the one that worked fined) I was lucky enough to remove the standby machine instead of the primary one, and in the second project (the one I had this issue) I was unlucky enough to remove the primary one instead of the standby. Is this correct?

Your solution fixed the problem.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.