Attempting to run llama-cpp-python on an a100-40GB GPU server (SIGILL)

I have an app that uses llama-cpp-python. In a normal CPU server it runs fine, however when running it on a GPU server it crashes with Main child exited with signal (with signal 'SIGILL', core dumped? false)
Does anyone know what’s going on? Has anyone deployed llama-cpp-python successfully on a fly.io GPU server? Thanks!