Automatically destroy machines after they stop?

caen · September 21, 2022, 5:38pm

So I finally got the machine orchestration to work. The machines I’m using only do one task, and after they stop automatically, they are not needed again.

Right now, I’m dispatching a delayed queued job in the orchestrator to destroy the machine, but in the sake of having as few moving parts as possible, I’m wondering if there is some kind of built in functionality for this.

TLDR: Is there an easy way to automatically destroy the machines permanently after they are have exited when the process is over?

fideloper-fly · September 21, 2022, 5:46pm

Hi!
Right now there isn’t a way to automagically destroy a machine - Instead, when a machine finishes (the process exists with a 0 status code), the machine goes into a “stopped” state - and can be started again quickly.

One other way to go about this is to make the machines re-usable, so they can re-run a task (and thus you wouldn’t need to create a new machine when a new task comes in). The viability of that depends on your use case.

I wonder if a Machine can make an API call to destroy itself against the API when it’s complete? It should be possible but there might be issues with that depending on what you’re orchestrating! For example, it would need to know about a Fly API key to do that.

Edit: Based on your other post, you might be running untrusted user code, so perhaps this isn’t the greatest idea!

caen · September 21, 2022, 5:52pm

Okay, thanks for letting me know! For my use case I don’t think reusing the machines make sense, as the whole point of using them is to run user input code in isolate.

Edit: regarding your edit: Yeah I had that idea too but I assume that will expose the API key, and you are correct, this is running untrusted user code.

rubys · September 21, 2022, 5:58pm

Congrats!

Question: can the code that creates a new machine be run in a thread?

If so, a possible design is to spawn a thread that

starts a machine
waits for the machine to complete - note the docs currently don’t show this but you can pass in a “state” query parameter which can be set to “stopped”. If you want a really long timeout, you may need to put this in a loop.
deletes the machine

caen · September 21, 2022, 6:35pm

If so, a possible design is to spawn a thread that…

This is what I was originally trying to do. But creating the machine makes it start automatically, and this request is not blocking. Are you saying that the “wait” endpoint can be used in conjunction with the create request to make the latter blocking? If so, that would be great because having the whole create-wait-destroy request cycle blocking would reduce a ton of state checks by having it all in a single thread.

Update To anyone reading this with a similar question, I just wanted to mention that this works great. The key here, that I didn’t understand first, is that you need to make two API calls. I tried to get the create call to be blocking, that didn’t work, so instead, after creating the machine, immediately send a new API call to the wait endpoint, this one will be blocking. And when you then receive a response, you can delete the machine.

rubys · September 21, 2022, 6:57pm

Yes, that is what I am saying.

Here is code that works in Ruby on Rails: Machine API · Fly Docs

Each time you see Fly::Machine:: that is a call to a Ruby module that does the equivalent of the curl commands in the machines documentation, so replace them with what you currently have.

Note that I’m currently calling “get a machine” in a loop, That’s because the current documentation for wait doesn’t mention passing in state as a query argument. I’ll fix my code and the machine documentation, but you don’t need to wait for me. For this purpose, the difference between wait and get a machine is that wait will block until the timeout specified.

Pretty much the rest of the logic you see there is logging.

caen · September 21, 2022, 7:43pm

Oh wow that’s awesome! Thank you so much!

kurt · September 21, 2022, 8:19pm

Machines are designed to be safely reusable between users. When they stop, we reset the whole state so they start up “clean” on the next run.

caen · September 21, 2022, 8:24pm

Machines are designed to be safely reusable between users. When they stop, we reset the whole state so they start up “clean” on the next run.

Okay that is really really awesome!

rubys · September 21, 2022, 11:52pm

Machines · Fly Docs has been updated:

States that no VPN/proxy is required if you are running on a fly VM; and shows how to set the token in secret.
Added a description of the state parameter to the wait API
Added a sentence to the start API to say that restarted VMs are reset.

caen · September 22, 2022, 2:00pm

Awesome! I’m trying to refactor to reusing an existing machine, but am having some issues passing data to it. The way I had it set up first, sent some required data using environment variables when creating the machine, but I can’t seem to pass environment variables when starting an existing machine. Is this not supported, or do I need another syntax? I’m not seeing anything in the fly machine docs.

By the way, I’m impressed with how fast you got the docs updated! I appreciate that.

ignoramous · September 22, 2022, 2:03pm

This didn’t happen in our case: Error failed to wait for VM in started state: failed to wait for machine to be ready

fideloper-fly · September 22, 2022, 2:25pm

Hi!

That’s correct (and has bitten me before too!) -

Currently, environment variables can’t be changed when starting a stopped a machine - they’re basically set in stone when you first create a machine. That limitation is something being discussed.

In the meantime, you would need another way to get data into a machine. Some ideas:

Perhaps when it starts, it polls an endpoint to see if there’s a “job” waiting to be done and gets data from the response to that polling.
Perhaps the machine has a small program that “listens” for connections on the local private network (via http, grpc, something like that). When your code base starts the machine, it can then connect and send it the data it needs to know how to run a job.

fideloper-fly · September 22, 2022, 2:27pm

I’ve only used Machines from an API point of view so far personally (vs using them as a replacement for Fly Apps scheduled by Nomad), so I can’t say for sure what’s up there.

However that sounds like a bit of a different case than what we’re discussing here - I think fly deploy, when running a machine, replaces the machine totally - while in this thread we are re-using an existing machine.

caen · September 22, 2022, 2:27pm

Currently, environment variables can’t be changed when starting a stopped a machine - they’re basically set in stone when you first create a machine. That limitation is something being discussed.

Ah okay, well for this case I think I will go back to creating a new machine for each job. I am not necessarily all that concerned about speed. It would be awesome if we could pass some data when starting a machine. I’ll definitely second that feature request.

ignoramous · September 23, 2022, 3:46am

The machine-ids and machine-names before and after flyctl deploy remain the same, though.

I wanted to point out that in my case, all (ie, 29) machines of a single app stopped, and then could never get started again.

Topic		Replies	Views
Ephemeral fly machine?	4	1040	September 7, 2022
Restart Machines Automatically	4	231	August 25, 2024
Fly machines automatically delete? Questions / Help	3	1191	April 26, 2023
Inconsistent machine statuses Questions / Help	2	258	July 2, 2023
Stopping a machine via the CLI actually destroys it	6	505	February 22, 2023

Automatically destroy machines after they stop?

Related topics