Build and deploy llama.cpp server on fly.io

hazelnutcloud · June 20, 2024, 4:35am

Hi all, I’ve written a Dockerfile that’ll build and deploy a llama.cpp server on fly.io along with the fly.toml configuration file. Here’s the Github repo.

It uses the most minimal dependencies possible to create a small image and downloads model files on initial boot and caches them in a volume for fast subsequent cold starts.

Hope this helps!

Topic		Replies	Views
Attempting to run llama-cpp-python on an a100-40GB GPU server (SIGILL) Build debugging	0	57	October 4, 2024
Can't find cuda libraries Questions / Help gpu	3	237	June 27, 2024
Pre-configured Apps Fresh Produce	3	648	April 5, 2024
Providing Fly configuration for others to use Questions / Help	3	632	January 17, 2022
Deploy first Django test app to Fly.io - Looking for Dockerfile Django	1	124	June 22, 2024

Build and deploy llama.cpp server on fly.io

Related topics