What’s happening here is that your worker machine is designated as a “standby” for another machine. The standby will stay stopped unless the primary machine goes down; at that point the standby starts up and takes over, so that your service can keep operating.
Under this behavior, when you deploy an update, the primary machine starts up, and the standby is forced to a shutdown state so it can be on standby. The standby can be started manually, however. This explains the behavior you’re seeing.
The reason your worker is a standby, usually, is because when you created the app you did so in a high-availability configuration which provisions two machines per process group. For http services, both machines can be up and serving requests; for machines with no services (as explained in the doc I linked) one machine is the primary and the other will be a standby, as I explained above. And then, very likely, you destroyed one of the machines in each process group; and happened to destroy the worker primary, leaving only the standby around. In the complete absence of a primary (as opposed to an existing but stopped primary) the standby will behave the way you observed.
Thanks for reading through this very long explanation. The solution is to mark your worker as not-a-standby so it starts properly after a deployment, since it will know now it’s the only machine in that process group:
fly machines update YOUR_WORKER_MACHINE_ID --standby-for '' # This is an empty string with single quotes
That was exactly what I did, I started the projects with 2 app and 2 worker machines and I down sized them to 1 and 1.
So, just to make sure I understood properly, in the first project (the one that worked fined) I was lucky enough to remove the standby machine instead of the primary one, and in the second project (the one I had this issue) I was unlucky enough to remove the primary one instead of the standby. Is this correct?