Build failed with elixir bumblebee

gonglexin · October 7, 2023, 5:49am

I’m trying to build a phoenix app which use bumblebee, and I can’t successfully build a image since I want to cache the ai model from huggingface. just like Speed up your boot times with this one Dockerfile trick · The Phoenix Files do.

Error message:

 => ERROR [builder 15/18] RUN mix run -e 'MyApp.Application.load_serving()' --no-start                                                                                                              8.0s 
------                                                                                                                                                                                                   
 > [builder 15/18] RUN mix run -e 'MyApp.Application.load_serving()' --no-start:                                                                                                                         
|==============
#0 5.675 [output clipped, log limit 100KiB/s reached]                                                                                                                                                    
|==========================================            |  78% (117.37/151.0
#0 6.084 [output clipped, log limit 100KiB/s reached]                                                                                                                                                    
#0 7.910 ** (exit) exited in: GenServer.call(EXLA.Client, {:client, :host, [platform: :host]}, :infinity)
#0 7.910     ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
#0 7.910     (elixir 1.15.6) lib/gen_server.ex:1063: GenServer.call/3
#0 7.910     (exla 0.6.1) lib/exla/backend.ex:154: EXLA.Backend.client_and_device_id/1
#0 7.910     (exla 0.6.1) lib/exla/backend.ex:44: EXLA.Backend.from_binary/3
#0 7.910     (bumblebee 0.4.2) lib/bumblebee/conversion/pytorch/loader.ex:79: Bumblebee.Conversion.PyTorch.Loader.object_resolver/1
#0 7.910     (unpickler 0.1.0) lib/unpickler.ex:828: Unpickler.resolve_object/2
#0 7.910     (unpickler 0.1.0) lib/unpickler.ex:818: anonymous fn/2 in Unpickler.finalize_stack_items/2
#0 7.910     (elixir 1.15.6) lib/map.ex:957: Map.get_and_update/3
#0 7.910     nofile:1: (file)
------
Error: failed to fetch an image or build from source: error building: failed to solve: executor failed running [/bin/sh -c mix run -e 'MyApp.Application.load_serving()' --no-start]: exit code: 1

Here is my Dockerfile:

ARG ELIXIR_VERSION=1.15.6
ARG OTP_VERSION=26.1.1
ARG DEBIAN_VERSION=bullseye-20230612-slim

ARG BUILDER_IMAGE="hexpm/elixir:${ELIXIR_VERSION}-erlang-${OTP_VERSION}-debian-${DEBIAN_VERSION}"
ARG RUNNER_IMAGE="debian:${DEBIAN_VERSION}"

FROM ${BUILDER_IMAGE} as builder

# install build dependencies
RUN apt-get update -y && apt-get install -y build-essential git curl \
    && apt-get clean && rm -f /var/lib/apt/lists/*_*

# prepare build dir
WORKDIR /app

# install hex + rebar
RUN mix local.hex --force && \
    mix local.rebar --force

# set build ENV
ENV MIX_ENV="prod"
ENV BUMBLEBEE_CACHE_DIR=/app/.bumblebee

# install mix dependencies
COPY mix.exs mix.lock ./
RUN mix deps.get --only $MIX_ENV
RUN mkdir config

# copy compile-time config files before we compile dependencies
# to ensure any relevant config change will trigger the dependencies
# to be re-compiled.
COPY config/config.exs config/${MIX_ENV}.exs config/
RUN mix deps.compile

COPY priv priv

COPY lib lib

COPY assets assets

# compile assets
RUN mix assets.deploy

# Compile the release
RUN mix compile
# NEW HERE
# Download the HuggingFace models to cache them
RUN mix run -e 'MyApp.Application.load_serving()' --no-start

# Changes to config/runtime.exs don't require recompiling the code
COPY config/runtime.exs config/

COPY rel rel
RUN mix release

# start a new build stage so that the final image will only contain
# the compiled release and other runtime necessities
FROM ${RUNNER_IMAGE}

RUN apt-get update -y && apt-get install -y libstdc++6 openssl libncurses5 locales curl ffmpeg wget \
  && apt-get clean && rm -f /var/lib/apt/lists/*_*

# Set the locale
RUN sed -i '/en_US.UTF-8/s/^# //g' /etc/locale.gen && locale-gen

ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8

WORKDIR "/app"
RUN chown nobody /app

# set runner ENV
ENV MIX_ENV="prod"

# Only copy the final release from the build stage
COPY --from=builder --chown=nobody:root /app/_build/${MIX_ENV}/rel/myapp ./
COPY --from=builder --chown=nobody:root /app/.bumblebee/ ./.bumblebee

USER nobody

# NEW HERE
ENV BUMBLEBEE_CACHE_DIR=/app/.bumblebee
ENV BUMBLEBEE_OFFLINE=true

CMD ["/app/bin/server"]

Does anybody have a clue on how to fix this?

darkcheftar007 · October 7, 2023, 7:00am

Seems related to the following
https://elixirforum.com/t/exit-no-process-the-process-is-not-alive-or-theres-no-process-currently-associated-with-the-given-name-possibly-because-its-application-isnt-started/4368/8

LuchoTurtle · November 14, 2023, 2:23pm

I’ve spent more time than I care to admit to get something like this to work as well but I keep getting stumped.

That command won’t work because of the error. You can circumvent it by adding ; return 0 at the end of it, it will make the model download.

However, even if you set BUMBLEBEE_CACHE_DIR and BUMBLEBEE_OFFLINE only after the model is downloaded, the application won’t run correctly because it will toss an error saying they can’t find the model files (even though they’re clearly there - I tested it whilst running in a Docker instance).

You can find more information about me struggling with it in Testing Image-To-Text · Issue #131 · dwyl/imgup · GitHub.

matthewford · November 17, 2023, 1:41pm

Did you manage to resolve this, we’ve yet to deploy bumblee for whisper, but now im thinking perhaps we need to deploy it as a separate service.

jstiebs · November 17, 2023, 1:47pm

If you follow the issue he ended up downloading the model on boot, he was attempting to download the model in the dockerfile but checking the volume. When downloading these very large models it’s best to download them to a volume once so they are ready to go on boot or deploy.

LuchoTurtle · November 17, 2023, 6:44pm

Yeah, as jstiebs said, I ended up downloading the model instead of trying to do the work on the Dockerfile. It was unfortunately too inconsistent and I kept stumbling upon errors of Bumblebee not managing to find the models.

Pretty much, there are two things you have to take into account:

your BUMBLEBEE_CACHE_DIR ought to be defined in your Dockerfile and it should be the same path as your volume on fly.io (you can see my dockerfile here). For example, you should have ENV BUMBLEBEE_CACHE_DIR="/app/.bumblebee/" (or any other path) in your dockerfile in both builder and runtime.
you verify and load the models on app startup (application.ex). I moved all the logic to a different file that loads the models conditionally according to the environment so I can test them as well.

Don’t forget that Mix.env() doesn’t work in fly.io machines, so you ought to get by with env variables.

You can check my PR if you want to see what I did. In the same PR, I created a small guide that follows what I did to get it working :).

LuchoTurtle · November 21, 2023, 3:28pm

I’ve made some changes to the guide.
The previous way I was managing models wasn’t optimal, so I’ve made it so each one has a dedicated folder that is cached and can be extended if you want to use other models without having to purge the model cache.

Cheers