Hi all, I’ve been trying to build and run the llama.cpp server binary. Here’s my dockerfile:
FROM ubuntu:22.04 as base
RUN apt update -q && apt install -y ca-certificates wget && \
wget -qO /cuda-keyring.deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb && \
dpkg -i /cuda-keyring.deb && apt update -q
FROM base as builder
RUN apt install -y --no-install-recommends git cuda-nvcc-12-0 libcublas-dev-12-0 libcurl4-openssl-dev
ENV PATH=$PATH:/usr/local/cuda/bin
RUN git clone --depth 1 https://github.com/ggerganov/llama.cpp.git /llama.cpp
RUN cd /llama.cpp && \
make LLAMA_CUDA=1 LLAMA_CURL=1 llama-server
FROM base as runtime
WORKDIR /app
RUN apt install -y --no-install-recommends cuda-cudart-12-0 libcudnn8
RUN mkdir -p /models
COPY --from=builder /llama.cpp/llama-server /app/llama-server
COPY ./llama-server.sh /app/llama-server.sh
RUN chmod +x /app/llama-server.sh
CMD ["/app/llama-server.sh"]
So far the build step works fine. However when running it, I run into this error:
/app/llama-server: error while loading shared libraries: libcuda.so.1: cannot open shared object file: No such file or directory
All help I searched online suggested to install the nvidia toolkit or use the nvidia-runtime base image. However, I’m trying to stay close to the guidelines provided by the fly documentation to keep the image as slim as possible. Is there a library that I’m missing that’s giving me that file not found error?
Thank you!