Fly GPUs Are Here

About a year ago, Kurt found a secret stash of A100s somewhere and we thought, “hmm, wouldn’t it be great if we could get Fly machines to work with GPUs?” We’ve been working on the idea ever since. Right now we have new A100 40G PCIes and 80G SXMs (read - big bois) coming online every day. They live in ord, iad, ams, sjc, and syd with more regions to come.

How Do I Use These?

We wanted working with GPUs to feel a lot like working with Fly machines, so that’s what we built - machines that are attached to GPUs. As an aside, Firecracker doesn’t play well with GPUs so we used Cloud Hypervisor, but the experience of working with these VMs is very similar.

To deploy an app on a GPU machine, you’d run something like this in flyctl (but don’t do it now, it won’t work until your org is GPU-blessed):

fly deploy --vm-gpu-kind a100-pcie-40gb --volume-initial-size 100

What Can I Use Them For?

Judging by our first batch of GPU users, a variety of things!

AI is, of course, popular - mostly inference with existing open-source models but also with custom models. We’re not set up for training large models, but I could imagine someone using us to fine-tune the last few layers of a model on their own data.

Some users have asked about rendering, and we’re into that. And of course some of you are exploring running your platform on top of Fly GPUs, which we love.

It’s still early, but you can check out some of the demos we’ve put together in our docs.

What’s the Cost?

Right now, the on-demand cost is $2.50/hr per GPU for the A100 40G PCIes and $3.50/hr per GPU for the A100 80G SXMs. There’s no minimums for usage. You decide the CPU, RAM, and storage you need. We have discounted pricing for reserved GPU machines and dedicated hosts.

How Do I Get One?

Everyone can use them today!

What are YOU interested in building with Fly GPUs? What other GPU-enabled shenanigans would you like to see Fly support? Leave us a note.

19 Likes

:wave: I wrote a thing on using GPU machines, which has some examples/info on how to work with GPU Machines when you get access:

https://fly.io/blog/transcribing-on-fly-gpu-machines/

5 Likes

What’s the minimum billing duration? 1min? 1h?

1 Like

Fly.io bills by the second!

2 Likes

Now with 100% more Youtube (and 100% less sunburn, but 20% more crooked hats), here’s a :vhs: on using Fly.io GPU Machines:

2 Likes

@nina @fideloper-fly,

It would be great to implement these in a live environment for our users. However, the combined high costs and the limitation of one GPU per user are significant hurdles. I have a few questions:

  1. Is it possible for GPU machines to scale down to zero when not in use?
  2. Is it feasible to quickly adjust the number of machines in response to workload changes? For instance, if I’m running 3 jobs, can I have 3 machines that scale up and down as needed?

We’re contemplating adding such advanced features for our premium users. We don’t anticipate needing a large scale most of the time—often, there will be no active instances, while at other times, we might need between 5 to 10.

Thank you!

Hi!

Here’s my attempt to answer with the caveat that I’m not 100% sure what you mean by the limitation of one GPU per user - can you clarify?

(I believe there’s a new feature, like from last week: --vm-gpus=N in flyctl to add more than one GPU to a machine at a time, but that doesn’t sound like what you mean).

In general:

Scale to Zero

Yep, you can scale to zero!

The GPU Machines work just like other Fly.io machine sin that they definitely don’t need to be running 24/7. You should spin them up only when you need them.

The caveat: They are generally a bit slower to start than “regular” machines due to (generally) the size of the image used to start the machine with. In my experience so far, this is like…tens of seconds instead of milliseconds.

Using them in conjunction with (a pool of) volumes is advisable so that the largest bits of data (models) are cached. Having those volumes keep a cache of (e.g.) pip libs + model data files can speed up time to getting started on the workload.

Multiple Machines

Multiple at the same time is fine!

1 Like

This sounds awesome, I’m gonna do everything to try and work with Fly as the DX just can’t be beat…

Regarding spin up… with EC2 it takes like 3 minutes to create an instance and the entire process is a major challenge. We build our own AMI’s and make tons of adjustments to try and improve this. Part of the problem is availability of the volume combined with the loading the massive base models (>3GB).

Any thoughts on that? We will be revisiting this stack in the next month or so…

@fideloper-fly

So the optimizations on Fly.io are analogous to your work on AWS:

  1. Build stuff into the Docker image you’ll use (analogous to making a pre-baked AMI). You want to keep this image on the smaller side, but of course the python libraries and stuff are often sizable (even without the model data)
  2. You can use volumes to pretty aggressively cache model files AND things like pip packages

I would try out a strategy of creating ~10 volumes and then attaching one of those volumes to a machine as you spin them up to do some work. Re-using these volumes will let Machines start their work faster as they the data within them (model files + pip caches) will be persistent.

The trick is to mount the volume at ~/.cache (likely /root/.cache) which is usually used as a location for pip cache and model files from the various Python libraries.

Pro tip: Once you have one volume, you can use fly volume fork to copy it. fly volumes fork · Fly Docs

Any idea on the wait list? I would really love to deploy my MVP and prototypes exclusively on fly, but there is no indication of position on the list or when it might free up.

@kibb we no longer have a waitlist! You can deploy to a GPU today :slight_smile:

Wherever I try to deploy the A100’s it says that it’s out of capacity. Is there any way I can find out where there is capacity?

@Aeolun Hmmm can you share your region and gpu type? We have plenty of capacity atm; let’s see what else is going on here!

@nina I’m currently running in the ord region with a l40s, and was trying to update this to a a100-40gb.

When that didn’t work I tried with a few other regions (I remember trying Australia). I do this by changing the GPU/region in fly.toml and then running fly deploy.

Hmm ok that sounds like the right approach for now, but I bet we can make that a nicer experience for you in the future.

Hi @Aeolun

I few things could be going on:

  • a100-40gb are not available in Australia (syd), you can only pick a100-80gb. See Fly GPUs quickstart · Fly Docs
  • Switching from l40s to a100-40gb within ord region should work, except if the machine has a volume attached. In that case you need to fork the volume hinting the destination is a a100-40gb host. fly volume fork ID --vm-size a100-40gb should do it.