Cannot create a10 machines

Hi,

i am currently trying to deploy a machine with 4 a10 gpus.
The deploy fails with this error:

Error: failed to update VM 5683d642fde2d8: internal: could not reserve resource for machine: no GPUs available to fulfill request (Request ID: 01J00J0968WKXT7FEZ5NR25S8R-ams) (Trace ID: ea7f02741ad8cacce29b7e3e826cfc05)

Thats my current fly.toml:

app = "flying-ollama"
primary_region = "ord"
[[vm]]
  size = "shared-cpu-2x"
  memory = "128gb"
  cpus = 4
  cpu_kind = "performance"
  gpus = 4
  gpu_kind = "a10"

[build]
  image = "ollama/ollama"

[mounts]
  source = "models"
  destination = "/root/.ollama"
  initial_size = "100gb"

[http_service]
  internal_port = 11434
  force_https = false
  auto_stop_machines = false
  auto_start_machines = true
  min_machines_running = 1
  processes = ["app"]

Are there really not enough resources available or something wrong with my config?

Hey floooat! We don’t have many A10s available at the moment but this is not a capacity problem per se.

The problem is that machine 5683d642fde2d8 is running on ams region on a host that can serve A100-80GB gpus, not A10s. fly deploy is trying to update the machine in-place and fails because A10s are not available on the host.

A simple way to overcome this limitation is to scale down the app with fly scale count 0 and then redeploy. It may fail because there is no volume name models in ord region. If that is the case, create it with:

fly volume create --region ord --vm-size a10 --vm-gpus 4 --size 100  models

Also, you can simplify the [[vm]] section as:

size = "a10"
memory = "128gb"
cpus = 4
gpus = 4
1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.