Build and deploy llama.cpp server on fly.io

Hi all, I’ve written a Dockerfile that’ll build and deploy a llama.cpp server on fly.io along with the fly.toml configuration file. Here’s the Github repo.

It uses the most minimal dependencies possible to create a small image and downloads model files on initial boot and caches them in a volume for fast subsequent cold starts.

Hope this helps!

1 Like