Machines first impressions

I’ve spent the last few days playing with Machines with little prior experience on the Fly platform and wanted to record some feedback. I know machines are very new and I’m sure a lot of this is already known but I’m hoping at least a bit of it can be useful as I haven’t seen a ton of discussion about these points on the forum.

Documentation

The Fly documentation in general is great and high quality, however the introduction of machines makes most of the Fly documentation quite confusing. My perspective is that I am new to the platform and am primarily here for machines. The concept of “apps” is confusing because there seem to be 2 kinds of apps (regular apps and machines apps, but this isn’t really explicitly stated anywhere) and its not really clear which docs are talking about regular apps versus machines apps. To use machines I need to know enough about apps to know how network routing works but I also need to ignore most of the stuff about per-app configuration.

Some examples of confusion:
fly.toml seems to be the way to go when running flyctl commands, basically just for the app field when dealing with machines. This initially lead me to think I should deploy an app and also build machines and they are designed coexist in some way (and in fact you can do that, which is confusing).

Volumes, it seems like volumes just can’t work with machines but there isn’t anything that says this explicitly. I kind of assumed since I can assign an ip address at the app level with flyctl I could do the same with volumes but that doesn’t seem to be the case since there isn’t a supported field for it in the machine creation POST request body. This is another case where you have to just ignore most of the platform docs except those specifically mentioning machines.

registry.fly.io is almost undocumented (the only page I could find referencing it is flyctl auth docker · Fly Docs I had to search the forum to figure out how it is intended to be used and what its privacy model was). I think this is because flyctl generally abstracts this away from you but for machines it seems very relevant for deploying private docker images.

Generally searching the forum has been very useful for finding implementation details on the fly platform outside of machines that aren’t described in depth on the docs site and I’m sure the same will be true about machines in the future.

Using Machines

Disclaimer: I’ve been playing around with UDP and as far as I can tell UDP is a bit of a second class citizen to TCP here, so I might have found edge cases with it and machines.

Its a bit weird to me that machines are the only place where you advertise using API calls directly instead of flyctl calls, which I get since its probably aimed at programmatic use but the mix of flyctl calls like flyctl ips allocate-v4 and network requests is a bit odd. It’d be great if there were examples of all steps using either flyctl or network requests (I’ve read on the forum that there is an unadvertised GraphQL API that can be used for many things, I’m planning on taking a look into this to get a full API driven workflow for provisioning machines).

In my time playing around with machines I’ve found a few bugs and I’ll list them below, but generally speaking its a really cool platform that definitely lives up to expectations from the product announcement. Its extremely fast and works great when you’re doing everything right.

Bugs

fly machine list often shows outdated info. flyctl machine status $machineId shows expected status in most cases.

I often get into states where I have a machine stuck in a state and can’t remove it because it isn’t in an eligible state to be removed. The way I’ve been removing these is just deleting and recreating the app (flyctl machine kill, flyctl machine stop and flyctl machine remove don’t work). Some of the ways to reliably reproduce this are:

  • Create a machine with a docker image from the fly registry that hasn’t been uploaded yet, it’ll be stuck in the “creating” state forever
  • When my app on the machine exits with code 0, the machine will start itself when a network request comes in, when this happens (it starts super quick by the way, it feels very magical) the status change isn’t recorded in the platform (it is stuck at “stopped”) nor does it get registered with the *.internal DNS addresses. I can’t use any of the commands to control it since it is in “stopped” state but if I fly ssh console "$ipv6Address I can connect to the machine (but interestingly fly ssh console won’t find it presumably because the DNS entries aren’t registered)

Billing

A quick note on hacking around with the Fly platform. The pricing seems friendly for playing around which is awesome, but I also don’t have a good way to tell if I currently have resources that are billing (if I left something behind / forgot to clean it up if I didn’t delete the app to avoid recreating it tomorrow) and I also don’t know how much it costs to run the resources I am using in real time. It doesn’t feel great to have uncertainty over my bill being $0 or $50 until some undetermined point in the future when the bill pops up.

4 Likes

Hey! I’m Dov. I am a private contractor working for fly on machine related things (mostly infrastructure as code) but I am not directly an employee of fly so don 't take anything I say as the company line :slight_smile: But let me see if I can’t answer some things.

Yeah this is confusing. It is an unfortunate fact of life at the moment. Eventually the concept of “apps” will be rebuilt around machines but for now while machines is in beta both exist. There could probably be some better docs on this.

Volumes work but aren’t documented. I’ll check and make sure it’s fine to explain here.

flyctl will catch up soon, I keep meaning to add at least a bit more stuff to flyctl, it’s just a bit behind while machines are being actively developed in beta.

Yeah don’t use the graphql API with machines for now, I’m pretty sure at the moment that would still cause some “undefined behavior” so to speak. Again once everything becomes more stable the api will be better.

Ha yeah I ran into trouble when I was working on the machines part of our terraform provider with this issue. The actual api that flyctl machine list uses behind the scenes is a victim of heavy caching. Always favor querying the ID directly to get up to date information. The list is only eventually consistent.

Yeah this is a known problem. I flagged it internally the other day and was informed a fix is being worked on. So watch out for that.

I know nothing about the billing. Gonna tag in @jsierles on this one!

I hope this helps at least a bit to clear some stuff up.

1 Like

I don’t know if I’d put it that way, but UDP is routed differently from TCP here — you can simply proxy TCP, and that’s what we do, we scale out a fleet of proxies. But you can’t do that easily with UDP, because UDP doesn’t have sessions or headers to stuff the original source address in. So we route UDP in the kernel.

UDP should work the same way for machines as it does for apps! If you’ve found something where it doesn’t, please let us know.

There’s nothing about using UDP that would change the way we orchestrate a machine vs. an app instance; the network plumbing is the same (and it’s not especially intrusive in our orchestration code; there isn’t some special handling we do for apps that happen to use UDP).

1 Like

Thanks for the replies! Even if its just noting that things are a bit confusing for now and they will be improved in the future its great to hear.

@DAlperin It’s great to hear volumes work and I’ll keep an eye out for when that gets documented

@thomas the reason I made that comment about UDP is because it seems like UDP is missing a lot of behavior I’ve seen working with TDP. Some examples being IPV6 as documented or machine autostarting not working with UDP.

As noted above I can sometimes get in a weird state where the machine does startup but isn’t listed anywhere as started but generally I can not reliably get a machine to startup based on a UDP connection. To reproduce this I created a simple app that listens on TCP and UDP (on different ports for simplicity) then shuts down after 10 seconds. Once the machine is shut down I can reliably ping the TCP server to start it up but was never able to start it with UDP packets. I was however able to confirm both the TCP and UDP messages were received by the server once it was running (happy to share my demo code if it’d help).

1 Like

Just noting it here: I have noticed this UDP-esque behaviour with TCP (public-ip), too. Sometimes more than one request is required to bring up the VM (or rather, to successfully connect to it). There’s some signalling issue between the edge (fly-proxy) and the worker (machine-vm), it’d seem.

Just a point: personally I despise any notion of machines. This is something I don’t want to care about and this is why I’m choosing fly.io in the first place. All I care is apps. I don’t want to care about the infrastructure the app is running on.

You don’t need to worry about machines. We needed them for ourselves, they’re there if you have a use for them, and the PaaS level tooling on top of them will continue to improve.

5 Likes

@thomas do you know if autoscaling based on UDP traffic is working yet?

Not Thomas but we don’t support autoscaling for UDP traffic atm. We currently don’t have any timeline on when that would be available.

Got it - that’s a little disappointing, but I will work around it. Do you think you could get the docs updated to be specific about this?

The autostart currently just says:

  • If there’s only one Machine in the region:
  • the proxy checks if the Machine has any traffic
  • if the Machine has no traffic (a load of 0), then the proxy stops the Machine

but it isn’t clear that this refers exclusively to TCP (?) or HTTP (?) traffic only, and not UDP.