I have an app that uses llama-cpp-python. In a normal CPU server it runs fine, however when running it on a GPU server it crashes with Main child exited with signal (with signal 'SIGILL', core dumped? false)
Does anyone know what’s going on? Has anyone deployed llama-cpp-python successfully on a fly.io GPU server? Thanks!