VM size planning

kurt · March 5, 2021, 3:32am

We’re exploring GPUs! They’re a hard problem because GPUs aren’t meant for multinenancy, so you have to figure out how to allocate an entire GPU to a VM. That also means they’re expensive.

Right now, we have two classes of vms. shared-cpu and dedicated-cpu. We’re planning to release:

Larger shared CPU VMs, probably shared-cpu-2x, 4x and 8x. This will you give you shared access to up to 8 CPUs. They might be pretty good for things with bursty CPU.
More dedicated CPUs. Right now we only go to 8. We could theoretically go to 24.

We’ve also been considering more permanent “reservations” for dedicated CPU VMs.

What kinds of things could you all use?

betawaffle · March 5, 2021, 12:35pm

It could be really nice to have a CPU option optimized for background/offline jobs, where it’s de-prioritized to only fill otherwise-unused CPU capacity. There must be some kind of cgroup magic that can be done to achieve that. Just to be clear, this would only make sense if that CPU option were also cheaper.

If you could “innovate” even further and make that CPU class only charge for core-seconds actually consumed, that would be amazing.

nate · March 6, 2021, 10:13pm

For CI, having a dedicated with high CPU would improve speed
4GB memory shared cpu would be great for a few of our services that aren’t used often but load large models.
For more info, the GPU is really not a big priority, so don’t count it as a strong vote from me

joshiain · June 13, 2022, 2:17pm

Is there an update on when larger shared CPU VMs might become available? e.g. shared-cpu-2x

brunoprietog · June 20, 2022, 10:05pm

I would use GPUs to train my machine learning models, on a recurring basis for example.

kurt · June 20, 2022, 11:13pm

We have larger shared-cpu-8x VMs available with Fly Machines. The apps you’re used to don’t have access to them yet, but we’re working on that!

kurt · June 20, 2022, 11:14pm

GPUs are on my long term wishlist. They’re a hard problem, though, and it’ll take at least a year before we do any GPU work.

Even then, I don’t think we’re going to be an ideal place to train models. We’ll be a great place to do inference on already trained models, though.

variabledefined · August 1, 2022, 4:24am

I would like to add some things here on this note.

We (fintech) have successfully trialed running trained Tensorflow models on Fly CPU instances, which as you would expect works without issues.

However, due to the nature of some of our models, on these the CPU alone is simply not enough to reach acceptable performance.

It should be quite obvious that actually training models is not what Fly is meant for, and trying to take on Gradient or Lambdalabs is nonsense and a $10 Colab subscription gives you a P100 to train on, but deploying these models into redundant production is something that I believe could propel your infrastructure to new spheres if it isn’t completely overpriced.

I’m the CTO (and use Fly personally), and have had the “pleasure” of exploring other options for our trained models, so on that note, please let me add that I would absolutely love to use Fly for our business instead of something else, and specifically comment on this reply:

They’re a hard problem because GPUs are meant for multinenancy

The NVIDIA A100 is actually designed to be able to be used in multi-instance/virtualisation environments. I am specifically mentioning this because another option I have looked at - Vultr - offers VM instances with GPU shares that are allocated using this feature. The cheapest option, $90 if I remember correctly, allocates 1/20th of an A100 80GB to the VM, and the bigger plans adding more fractional shares.

I can also assure you that models in general will not be able to run on CPUs for very long until GPUs become by default necessary, and if you can match this pricing to some degree, while still offering the somewhat unique benefits of your infrastructure, I can 100% guarantee you that you will have unimaginable demand lining up at your door very very soon. I know of one business personally that would migrate immediately.

There’s a very obvious and distinct gap in the market here, you could definitely squeeze your foot into the door and get some of the first-movers advantage.

Topic		Replies	Views
Suggestion on Shared vs Dedicated CPU	2	1447	October 21, 2021
Fly VMs specs	3	3153	May 8, 2023
Vertical Auto Scaling (VM Size)	4	742	July 26, 2023
shared-cpu-1x performance on CPU bound tasks Questions / Help	7	2661	August 2, 2022
Isn't there a way to get 1 dedicated CPU with less RAM?	28	1487	July 4, 2022

VM size planning

Related topics