Automatically destroy machines after they stop?

So I finally got the machine orchestration to work. The machines I’m using only do one task, and after they stop automatically, they are not needed again.

Right now, I’m dispatching a delayed queued job in the orchestrator to destroy the machine, but in the sake of having as few moving parts as possible, I’m wondering if there is some kind of built in functionality for this.

TLDR: Is there an easy way to automatically destroy the machines permanently after they are have exited when the process is over?

1 Like

Hi!
Right now there isn’t a way to automagically destroy a machine - Instead, when a machine finishes (the process exists with a 0 status code), the machine goes into a “stopped” state - and can be started again quickly.

One other way to go about this is to make the machines re-usable, so they can re-run a task (and thus you wouldn’t need to create a new machine when a new task comes in). The viability of that depends on your use case.

I wonder if a Machine can make an API call to destroy itself against the API when it’s complete? It should be possible but there might be issues with that depending on what you’re orchestrating! For example, it would need to know about a Fly API key to do that.

Edit: Based on your other post, you might be running untrusted user code, so perhaps this isn’t the greatest idea!

1 Like

Okay, thanks for letting me know! For my use case I don’t think reusing the machines make sense, as the whole point of using them is to run user input code in isolate.

Edit: regarding your edit: Yeah I had that idea too but I assume that will expose the API key, and you are correct, this is running untrusted user code.

Congrats!

Question: can the code that creates a new machine be run in a thread?

If so, a possible design is to spawn a thread that

  1. starts a machine
  2. waits for the machine to complete - note the docs currently don’t show this but you can pass in a “state” query parameter which can be set to “stopped”. If you want a really long timeout, you may need to put this in a loop.
  3. deletes the machine
1 Like

If so, a possible design is to spawn a thread that…

This is what I was originally trying to do. But creating the machine makes it start automatically, and this request is not blocking. Are you saying that the “wait” endpoint can be used in conjunction with the create request to make the latter blocking? If so, that would be great because having the whole create-wait-destroy request cycle blocking would reduce a ton of state checks by having it all in a single thread.

Update To anyone reading this with a similar question, I just wanted to mention that this works great. The key here, that I didn’t understand first, is that you need to make two API calls. I tried to get the create call to be blocking, that didn’t work, so instead, after creating the machine, immediately send a new API call to the wait endpoint, this one will be blocking. And when you then receive a response, you can delete the machine.

1 Like

Yes, that is what I am saying.

Here is code that works in Ruby on Rails: Machine API · Fly Docs

Each time you see Fly::Machine:: that is a call to a Ruby module that does the equivalent of the curl commands in the machines documentation, so replace them with what you currently have.

Note that I’m currently calling “get a machine” in a loop, That’s because the current documentation for wait doesn’t mention passing in state as a query argument. I’ll fix my code and the machine documentation, but you don’t need to wait for me. For this purpose, the difference between wait and get a machine is that wait will block until the timeout specified.

Pretty much the rest of the logic you see there is logging.

2 Likes

Oh wow that’s awesome! Thank you so much!

Machines are designed to be safely reusable between users. When they stop, we reset the whole state so they start up “clean” on the next run.

1 Like

Machines are designed to be safely reusable between users. When they stop, we reset the whole state so they start up “clean” on the next run.

Okay that is really really awesome!

1 Like

Machines · Fly Docs has been updated:

  • States that no VPN/proxy is required if you are running on a fly VM; and shows how to set the token in secret.
  • Added a description of the state parameter to the wait API
  • Added a sentence to the start API to say that restarted VMs are reset.
2 Likes

Awesome! I’m trying to refactor to reusing an existing machine, but am having some issues passing data to it. The way I had it set up first, sent some required data using environment variables when creating the machine, but I can’t seem to pass environment variables when starting an existing machine. Is this not supported, or do I need another syntax? I’m not seeing anything in the fly machine docs.

By the way, I’m impressed with how fast you got the docs updated! I appreciate that.

This didn’t happen in our case: Error failed to wait for VM in started state: failed to wait for machine to be ready :frowning:

Hi!

That’s correct (and has bitten me before too!) -

Currently, environment variables can’t be changed when starting a stopped a machine - they’re basically set in stone when you first create a machine. That limitation is something being discussed.

In the meantime, you would need another way to get data into a machine. Some ideas:

  1. Perhaps when it starts, it polls an endpoint to see if there’s a “job” waiting to be done and gets data from the response to that polling.
  2. Perhaps the machine has a small program that “listens” for connections on the local private network (via http, grpc, something like that). When your code base starts the machine, it can then connect and send it the data it needs to know how to run a job.

I’ve only used Machines from an API point of view so far personally (vs using them as a replacement for Fly Apps scheduled by Nomad), so I can’t say for sure what’s up there.

However that sounds like a bit of a different case than what we’re discussing here - I think fly deploy, when running a machine, replaces the machine totally - while in this thread we are re-using an existing machine.

Currently, environment variables can’t be changed when starting a stopped a machine - they’re basically set in stone when you first create a machine. That limitation is something being discussed.

Ah okay, well for this case I think I will go back to creating a new machine for each job. I am not necessarily all that concerned about speed. It would be awesome if we could pass some data when starting a machine. I’ll definitely second that feature request.

The machine-ids and machine-names before and after flyctl deploy remain the same, though.

I wanted to point out that in my case, all (ie, 29) machines of a single app stopped, and then could never get started again.