I’ve been trying to run LLMs via the llama-cpp-python
library, which requires cmake, CUDA, and its other libraries (CUBLAS) in order to run with GPU acceleration. Below is my Dockerfile (borrowed heavily from the Fly GPU quickstart docs:
FROM ubuntu:22.04
RUN apt update -q \
&& apt install -y ca-certificates cmake gcc-11 g++-11 git parallel wget ffmpeg python3.10 python3-pip
&& wget -qO /cuda-keyring.deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb \
&& dpkg -i /cuda-keyring.deb \
&& apt update -q \
&& apt install -y cuda-nvcc-12-2 libcublas-12-2 libcudnn8 cuda-libraries-12-2
# Install pip pkgs to run python script to download, transcribe, and upload videos
WORKDIR /app
COPY . .
ENV CUDA_DOCKER_ARCH=all
ENV LLAMA_CUBLAS=1
RUN pip install -r requirements.txt \
&& pip install flash-attn --no-build-isolation \
&& CUDACXX=/usr/local/cuda-12.2/bin/nvcc CMAKE_ARGS="-DLLAMA_CUBLAS=on -DCMAKE_CUDA_ARCHITECTURES=native" FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir --force-reinstall --upgrade
CMD ["python3", "transcribe_yt_videos.py"]
However, this fails to run because I run into this error:
Target "ggml" links to:
CUDA::cublas
but the target was not found. Possible reasons include:
* There is a typo in the target name.
* A find_package call is missing for an IMPORTED target.
* An ALIAS target is missing.
CMake Generate step failed. Build files cannot be regenerated correctly.
I thought it was an issue with the library but when I sshed into the VM and ran ls /usr/local/cuda-12.2/include | grep cublas
to verify CUBLAS was installed, there were no CUBLAS header files! That and this issue indicates to me there was an issue with the installation process.
I ended up getting this working by using a Nvidia 12.2.2 CUDA on Ubuntu 22.04 base image, but I want to understand why this Dockerfile setup doesn’t install CUBLAS despite explicit commands in the Dockerfile to install the CUDA toolkit and other libraries.