FLAME Backend often stuck at "Pulling container image..."

Hey,

I’m currently building an app with the FLAME pattern and while it often works great, 40-50% of the time it’s stuck on “Pulling container image…”. I read forum posts regarding that problem with regular deploys, switched regions for my app, but it still acts the same.

I actually see the flame machine starting up in my machines dashboard but it still shows the pause :pause_button: symbol. After the timeout hits after 30 seconds, I usually see the machine turning green right away and then disappear after it has been idling for a few seconds.

Here’s the output from the log. as you can see after the 30 seconds have passed the flame machine is successfully reporting in and then idling for a bit before disappearing again.

2023-12-11T06:01:26.426 app[3d8d7413be6338] iad [info] 06:01:26.425 [info] Starting demuxing process...

2023-12-11T06:01:26.685 runner[e78423eece1e28] iad [info] Pulling container image registry.fly.io/demux:deployment-01HHBP2W6T2P20GSZ5X7DBR21E

2023-12-11T06:01:56.426 app[3d8d7413be6338] iad [info] 06:01:56.425 [error] GenServer #PID<0.2525.0> terminating

2023-12-11T06:01:56.426 app[3d8d7413be6338] iad [info] ** (stop) exited in: FLAME.Pool.place_child(Demux.FFMpegRunner, {Demux.VideoProcessor, {#PID<0.2525.0>, #Reference<0.3700968693.2684092417.156198>, #PID<0.2525.0>, [input_path: "/app/uploads/301e351b-991e-482e-8aa6-366620755789", caller: #PID<0.2525.0>, parent_stream: %File.Stream{path: "/app/uploads/301e351b-991e-482e-8aa6-366620755789", modes: [:raw, :read_ahead, :binary], line_or_bytes: 2028, raw: true, node: :"demux-01HHBP2W6T2P20GSZ5X7DBR21E@fdaa:0:258c:a7b:23d:f627:1906:2"}, parent_audio_stream: %File.Stream{path: "/app/uploads/301e351b-991e-482e-8aa6-366620755789.mp3", modes: [:raw, :read_ahead, :binary], line_or_bytes: 2028, raw: true, node: :"demux-01HHBP2W6T2P20GSZ5X7DBR21E@fdaa:0:258c:a7b:23d:f627:1906:2"}]}}, [])

2023-12-11T06:01:56.426 app[3d8d7413be6338] iad [info] ** (EXIT) time out

2023-12-11T06:01:56.426 app[3d8d7413be6338] iad [info] (flame 0.1.5) lib/flame/pool.ex:238: FLAME.Pool.exit!/3

2023-12-11T06:01:56.426 app[3d8d7413be6338] iad [info] (demux 0.1.0) lib/demux/video_processor.ex:26: Demux.VideoProcessor.start/1

2023-12-11T06:01:56.426 app[3d8d7413be6338] iad [info] (demux 0.1.0) lib/demux_web/live/home_live.ex:59: anonymous fn/2 in DemuxWeb.HomeLive.handle_progress/3

2023-12-11T06:01:56.426 app[3d8d7413be6338] iad [info] (phoenix_live_view 0.20.1) lib/phoenix_live_view/upload_channel.ex:20: Phoenix.LiveView.UploadChannel.consume/3

2023-12-11T06:01:56.426 app[3d8d7413be6338] iad [info] (elixir 1.15.6) lib/enum.ex:1693: Enum."-map/2-lists^map/1-1-"/2

2023-12-11T06:01:56.426 app[3d8d7413be6338] iad [info] (phoenix_live_view 0.20.1) lib/phoenix_live_view/upload.ex:264: Phoenix.LiveView.Upload.consume_uploaded_entry/3

2023-12-11T06:01:56.426 app[3d8d7413be6338] iad [info] (demux 0.1.0) lib/demux_web/live/home_live.ex:55: DemuxWeb.HomeLive.handle_progress/3

2023-12-11T06:01:56.426 app[3d8d7413be6338] iad [info] (phoenix_live_view 0.20.1) lib/phoenix_live_view/channel.ex:181: anonymous fn/4 in Phoenix.LiveView.Channel.handle_info/2

2023-12-11T06:01:56.427 app[3d8d7413be6338] iad [info] Last message: %Phoenix.Socket.Message{topic: "lv:phx-F5-xl2DUYhdmtwJx", event: "progress", payload: %{"entry_ref" => "0", "event" => nil, "progress" => 100, "ref" => "phx-F5-xl3sWvf9kxwKR"}, ref: "263", join_ref: "4"}

2023-12-11T06:02:02.003 runner[e78423eece1e28] iad [info] Successfully prepared image registry.fly.io/demux:deployment-01HHBP2W6T2P20GSZ5X7DBR21E (35.31758922s)

2023-12-11T06:02:02.568 runner[e78423eece1e28] iad [info] Configuring firecracker

2023-12-11T06:02:02.827 app[e78423eece1e28] iad [info] [ 0.145765] Spectre V2 : WARNING: Unprivileged eBPF is enabled with eIBRS on, data leaks possible via Spectre v2 BHB attacks!

2023-12-11T06:02:02.860 app[e78423eece1e28] iad [info] [ 0.155205] PCI: Fatal: No config space access function found

2023-12-11T06:02:03.052 app[e78423eece1e28] iad [info] INFO Starting init (commit: 15238e9)...

2023-12-11T06:02:03.068 app[e78423eece1e28] iad [info] INFO Preparing to run: `/app/bin/server` as nobody

2023-12-11T06:02:03.073 app[e78423eece1e28] iad [info] INFO [fly api proxy] listening at /.fly/api

2023-12-11T06:02:03.081 app[e78423eece1e28] iad [info] 2023/12/11 06:02:03 listening on [fdaa:0:258c:a7b:22c:35c7:900a:2]:22 (DNS: [fdaa::3]:53)

2023-12-11T06:02:04.105 app[3d8d7413be6338] iad [info] 06:02:04.104 request_id=F5-xqR9s3LGB6O8AAALh [info] GET /

2023-12-11T06:02:04.106 app[3d8d7413be6338] iad [info] 06:02:04.105 request_id=F5-xqR9s3LGB6O8AAALh [info] Sent 200 in 986µs

2023-12-11T06:02:04.405 app[3d8d7413be6338] iad [info] 06:02:04.405 [info] CONNECTED TO Phoenix.LiveView.Socket in 22µs

2023-12-11T06:02:04.405 app[3d8d7413be6338] iad [info] Transport: :websocket

2023-12-11T06:02:04.405 app[3d8d7413be6338] iad [info] Serializer: Phoenix.Socket.V2.JSONSerializer

2023-12-11T06:02:04.405 app[3d8d7413be6338] iad [info] Parameters: %{"_csrf_token" => "LlgydjFuWhRZCwMZDV0MOh8uBS81JDwEHksDW44-nYTvYhEtwKLfETpb", "_live_referer" => "undefined", "_mounts" => "0", "_track_static" => %{"0" => "https://demux.fly.dev/assets/app-57e01b25244afe4043b664679b8cfda3.css?vsn=d", "1" => "https://demux.fly.dev/assets/app-2f4eeb613965956fb8b68d387bd72f27.js?vsn=d"}, "vsn" => "2.0.0"}

2023-12-11T06:02:05.532 app[e78423eece1e28] iad [info] 06:02:05.532 [info] starting with parent %FLAME.Parent{pid: #PID<46278.2552.0>, ref: #Reference<46278.3700968693.2684092417.156212>, backend: FLAME.FlyBackend}

2023-12-11T06:02:05.537 app[e78423eece1e28] iad [info] 06:02:05.536 [info] connect (1) :"demux-01HHBP2W6T2P20GSZ5X7DBR21E@fdaa:0:258c:a7b:23d:f627:1906:2": true

2023-12-11T06:02:06.077 app[e78423eece1e28] iad [info] WARN Reaped child process with pid: 363 and signal: SIGUSR1, core dumped? false

2023-12-11T06:02:15.539 app[e78423eece1e28] iad [info] 06:02:15.538 [info] FLAME.Terminator.system_stop: idle shutdown

2023-12-11T06:02:16.089 app[e78423eece1e28] iad [info] WARN Reaped child process with pid: 366 and signal: SIGUSR1, core dumped? false

2023-12-11T06:02:19.092 app[e78423eece1e28] iad [info] INFO Main child exited normally with code: 0

2023-12-11T06:02:19.092 app[e78423eece1e28] iad [info] INFO Starting clean up.

2023-12-11T06:02:19.093 app[e78423eece1e28] iad [info] WARN hallpass exited, pid: 307, status: signal: 15 (SIGTERM)

2023-12-11T06:02:19.098 app[e78423eece1e28] iad [info] 2023/12/11 06:02:19 listening on [fdaa:0:258c:a7b:22c:35c7:900a:2]:22 (DNS: [fdaa::3]:53)

2023-12-11T06:02:20.094 app[e78423eece1e28] iad [info] [ 17.387825] reboot: Restarting system

2023-12-11T06:02:20.232 runner[e78423eece1e28] iad [info] machine restart policy set to 'no', not restarting

So far I’ve used IAD and MAD as regions, but I’m not sure if that’s relevant at all. Does anybody have an idea what I could improve here?

Thank you!

How big is your image? The first time an image is pulled to a new host (different host from the parent ), it can take longer but shouldn’t be timing out like this unless we’re having issues with our registry. Can you try to configure the boot_timeout of the pool to something like 120_000 and see if fly is able to eventually pull the image? It will log how long the image pull took. I will check on our side to see if the registry in iad is acting up.

1 Like

Hey Chris, thanks for getting back to me!

That was good info. At first I thought, it really only happens for the first time, but I did a bunch of more tests and it happened the same way with try 4 or 5.

The image size is 2.1GB, it’s a brandnew phoenix app & ffmpeg, everything else is default. I tried the boot_timeout with 120_000 but that didn’t change anything. I also set the timeout of the Genserver (receive/1 block after place_child) to 60s (was 20), but that also did nothing. It always seems to blow up at exactly 30s. Here’s new output from just now:

2023-12-11T14:07:56.133 app[3d8d7413be6338] iad [info] 14:07:56.132 [info] File Upload done. Copying file to /app/uploads/6838511d-2a4f-4837-9780-58be5c673126

2023-12-11T14:07:56.153 app[3d8d7413be6338] iad [info] 14:07:56.151 [info] Starting demuxing process...

2023-12-11T14:07:56.392 runner[148ed22c749228] iad [info] Pulling container image registry.fly.io/demux:deployment-01HHCJNZGPHATB69Y2XK4KGHAK

2023-12-11T14:08:26.154 app[3d8d7413be6338] iad [info] 14:08:26.152 [error] GenServer #PID<0.2126.0> terminating

2023-12-11T14:08:26.154 app[3d8d7413be6338] iad [info] ** (stop) exited in: FLAME.Pool.place_child(Demux.FFMpegRunner, {Demux.VideoProcessor, {#PID<0.2126.0>, #Reference<0.3708040946.4104126465.48062>, #PID<0.2126.0>, [input_path: "/app/uploads/6838511d-2a4f-4837-9780-58be5c673126", caller: #PID<0.2126.0>, parent_stream: %File.Stream{path: "/app/uploads/6838511d-2a4f-4837-9780-58be5c673126", modes: [:raw, :read_ahead, :binary], line_or_bytes: 2028, raw: true, node: :"demux-01HHCJNZGPHATB69Y2XK4KGHAK@fdaa:0:258c:a7b:23d:f627:1906:2"}, parent_audio_stream: %File.Stream{path: "/app/uploads/6838511d-2a4f-4837-9780-58be5c673126.mp3", modes: [:raw, :read_ahead, :binary], line_or_bytes: 2028, raw: true, node: :"demux-01HHCJNZGPHATB69Y2XK4KGHAK@fdaa:0:258c:a7b:23d:f627:1906:2"}]}}, [])

2023-12-11T14:08:26.154 app[3d8d7413be6338] iad [info] ** (EXIT) time out

2023-12-11T14:08:26.154 app[3d8d7413be6338] iad [info] (flame 0.1.5) lib/flame/pool.ex:238: FLAME.Pool.exit!/3

2023-12-11T14:08:26.154 app[3d8d7413be6338] iad [info] (demux 0.1.0) lib/demux/video_processor.ex:26: Demux.VideoProcessor.start/1

2023-12-11T14:08:26.154 app[3d8d7413be6338] iad [info] (demux 0.1.0) lib/demux_web/live/home_live.ex:59: anonymous fn/2 in DemuxWeb.HomeLive.handle_progress/3

2023-12-11T14:08:26.154 app[3d8d7413be6338] iad [info] (phoenix_live_view 0.20.1) lib/phoenix_live_view/upload_channel.ex:20: Phoenix.LiveView.UploadChannel.consume/3

2023-12-11T14:08:26.154 app[3d8d7413be6338] iad [info] (elixir 1.15.6) lib/enum.ex:1693: Enum."-map/2-lists^map/1-1-"/2

2023-12-11T14:08:26.154 app[3d8d7413be6338] iad [info] (phoenix_live_view 0.20.1) lib/phoenix_live_view/upload.ex:264: Phoenix.LiveView.Upload.consume_uploaded_entry/3

2023-12-11T14:08:26.154 app[3d8d7413be6338] iad [info] (demux 0.1.0) lib/demux_web/live/home_live.ex:55: DemuxWeb.HomeLive.handle_progress/3

2023-12-11T14:08:26.154 app[3d8d7413be6338] iad [info] (phoenix_live_view 0.20.1) lib/phoenix_live_view/channel.ex:181: anonymous fn/4 in Phoenix.LiveView.Channel.handle_info/2

2023-12-11T14:08:26.154 app[3d8d7413be6338] iad [info] Last message: %Phoenix.Socket.Message{topic: "lv:phx-F5_MJBs-30077wDR", event: "progress", payload: %{"entry_ref" => "0", "event" => nil, "progress" => 100, "ref" => "phx-F5_MJD-UUja8LwDx"}, ref: "263", join_ref: "4"}

2023-12-11T14:08:26.155 app[3d8d7413be6338] iad [info] 14:08:26.153 [error] GenServer #PID<0.2133.0> terminating

2023-12-11T14:08:26.155 app[3d8d7413be6338] iad [info] ** (stop) exited in: FLAME.Pool.place_child(Demux.FFMpegRunner, {Demux.VideoProcessor, {#PID<0.2126.0>, #Reference<0.3708040946.4104126465.48062>, #PID<0.2126.0>, [input_path: "/app/uploads/6838511d-2a4f-4837-9780-58be5c673126", caller: #PID<0.2126.0>, parent_stream: %File.Stream{path: "/app/uploads/6838511d-2a4f-4837-9780-58be5c673126", modes: [:raw, :read_ahead, :binary], line_or_bytes: 2028, raw: true, node: :"demux-01HHCJNZGPHATB69Y2XK4KGHAK@fdaa:0:258c:a7b:23d:f627:1906:2"}, parent_audio_stream: %File.Stream{path: "/app/uploads/6838511d-2a4f-4837-9780-58be5c673126.mp3", modes: [:raw, :read_ahead, :binary], line_or_bytes: 2028, raw: true, node: :"demux-01HHCJNZGPHATB69Y2XK4KGHAK@fdaa:0:258c:a7b:23d:f627:1906:2"}]}}, [])

2023-12-11T14:08:26.155 app[3d8d7413be6338] iad [info] ** (EXIT) time out

2023-12-11T14:08:26.155 app[3d8d7413be6338] iad [info] Last message: {:DOWN, #Reference<0.3708040946.4104126465.40325>, :process, #PID<0.2126.0>, {:timeout, {FLAME.Pool, :place_child, [Demux.FFMpegRunner, {Demux.VideoProcessor, {#PID<0.2126.0>, #Reference<0.3708040946.4104126465.48062>, #PID<0.2126.0>, [input_path: "/app/uploads/6838511d-2a4f-4837-9780-58be5c673126", caller: #PID<0.2126.0>, parent_stream: %File.Stream{path: "/app/uploads/6838511d-2a4f-4837-9780-58be5c673126", modes: [:raw, :read_ahead, :binary], line_or_bytes: 2028, raw: true, node: :"demux-01HHCJNZGPHATB69Y2XK4KGHAK@fdaa:0:258c:a7b:23d:f627:1906:2"}, parent_audio_stream: %File.Stream{path: "/app/uploads/6838511d-2a4f-4837-9780-58be5c673126.mp3", modes: [:raw, :read_ahead, :binary], line_or_bytes: 2028, raw: true, node: :"demux-01HHCJNZGPHATB69Y2XK4KGHAK@fdaa:0:258c:a7b:23d:f627:1906:2"}]}}, []]}}}

2023-12-11T14:08:30.099 proxy[3d8d7413be6338] iad [info] Downscaling app demux from 1 machines to 0 machines, stopping machine 3d8d7413be6338 (region=iad, process group=app)

2023-12-11T14:08:30.104 app[3d8d7413be6338] iad [info] INFO Sending signal SIGTERM to main child process w/ PID 314

2023-12-11T14:08:30.104 app[3d8d7413be6338] iad [info] 14:08:30.104 [notice] SIGTERM received - shutting down

2023-12-11T14:08:31.093 app[3d8d7413be6338] iad [info] WARN Reaped child process with pid: 374 and signal: SIGUSR1, core dumped? false

2023-12-11T14:08:32.096 app[3d8d7413be6338] iad [info] INFO Main child exited normally with code: 0

2023-12-11T14:08:32.096 app[3d8d7413be6338] iad [info] INFO Starting clean up.

2023-12-11T14:08:32.097 app[3d8d7413be6338] iad [info] INFO Umounting /dev/vdb from app/uploads

2023-12-11T14:08:32.101 app[3d8d7413be6338] iad [info] WARN hallpass exited, pid: 315, status: signal: 15 (SIGTERM)

2023-12-11T14:08:32.105 app[3d8d7413be6338] iad [info] 2023/12/11 14:08:32 listening on [fdaa:0:258c:a7b:23d:f627:1906:2]:22 (DNS: [fdaa::3]:53)

2023-12-11T14:08:33.100 app[3d8d7413be6338] iad [info] [ 333.667141] reboot: Restarting system

2023-12-11T14:08:34.794 runner[148ed22c749228] iad [info] Successfully prepared image registry.fly.io/demux:deployment-01HHCJNZGPHATB69Y2XK4KGHAK (38.402166575s)

2023-12-11T14:08:35.280 runner[148ed22c749228] iad [info] Configuring firecracker

2023-12-11T14:08:35.546 app[148ed22c749228] iad [info] [ 0.140976] PCI: Fatal: No config space access function found

2023-12-11T14:08:35.770 app[148ed22c749228] iad [info] INFO Starting init (commit: 15238e9)...

2023-12-11T14:08:35.788 app[148ed22c749228] iad [info] INFO Preparing to run: `/app/bin/server` as nobody

2023-12-11T14:08:35.795 app[148ed22c749228] iad [info] INFO [fly api proxy] listening at /.fly/api

2023-12-11T14:08:35.805 app[148ed22c749228] iad [info] 2023/12/11 14:08:35 listening on [fdaa:0:258c:a7b:ab8:6285:4e3a:2]:22 (DNS: [fdaa::3]:53)

2023-12-11T14:08:37.274 proxy[3d8d7413be6338] iad [error] timed out while connecting to your instance. this indicates a problem with your app (hint: look at your logs and metrics)

2023-12-11T14:08:37.440 proxy[3d8d7413be6338] iad [info] Starting machine

2023-12-11T14:08:37.632 app[3d8d7413be6338] iad [info] [ 0.038680] Spectre V2 : WARNING: Unprivileged eBPF is enabled with eIBRS on, data leaks possible via Spectre v2 BHB attacks!

2023-12-11T14:08:37.670 app[3d8d7413be6338] iad [info] [ 0.050942] PCI: Fatal: No config space access function found

2023-12-11T14:08:37.869 app[3d8d7413be6338] iad [info] INFO Starting init (commit: 15238e9)...

2023-12-11T14:08:37.888 app[3d8d7413be6338] iad [info] INFO Mounting /dev/vdb at app/uploads w/ uid: 65534, gid: 65534 and chmod 0755

2023-12-11T14:08:37.891 app[3d8d7413be6338] iad [info] INFO Resized app/uploads to 3204448256 bytes

2023-12-11T14:08:37.892 app[3d8d7413be6338] iad [info] INFO Preparing to run: `/app/bin/server` as nobody

2023-12-11T14:08:37.896 app[3d8d7413be6338] iad [info] INFO [fly api proxy] listening at /.fly/api

2023-12-11T14:08:37.905 app[3d8d7413be6338] iad [info] 2023/12/11 14:08:37 listening on [fdaa:0:258c:a7b:23d:f627:1906:2]:22 (DNS: [fdaa::3]:53)

2023-12-11T14:08:37.936 proxy[3d8d7413be6338] iad [info] machine started in 496.392621ms

2023-12-11T14:08:38.799 app[148ed22c749228] iad [info] WARN Reaped child process with pid: 363 and signal: SIGUSR1, core dumped? false

2023-12-11T14:08:38.960 app[148ed22c749228] iad [info] 14:08:38.959 [info] starting with parent %FLAME.Parent{pid: #PID<46278.2146.0>, ref: #Reference<46278.3708040946.4104126465.48076>, backend: FLAME.FlyBackend}

Notice how the second line (start demuxing) starts at 14:07:56. It starts pulling the container in the same second. Exactly 30s afterwards: Poof!:boom: At 14:08:26.
I think the error suggests that the Genserver is timing out, but I didn’t define any timeout values in init or any other callbacks. The code to run this is basically adopted from your thumbnail example (messy, all work in progress):

  def start(opts \\ []) do
    Keyword.validate!(opts, [:timeout, :caller, :input_path])
    timeout = Keyword.get(opts, :timeout, 120_000)
    caller = Keyword.get(opts, :caller, self())
    parent_ref = make_ref()
    parent = self()

    input_file = Keyword.get(opts, :input_path, "")
    parent_stream = File.stream!(input_file, [], 2028)
    parent_audio_stream = File.stream!(input_file <> ".mp3", [], 2028)
    opts = opts ++ [parent_stream: parent_stream, parent_audio_stream: parent_audio_stream]

    spec = {__MODULE__, {caller, parent_ref, parent, opts}}
    {:ok, pid} = FLAME.place_child(Demux.FFMpegRunner, spec)

    receive do
      {^parent_ref, %VideoProcessor{} = gen} ->
        %VideoProcessor{gen | pid: pid}
    after
      timeout ->
        Logger.error("Timeout after #{timeout}")
        exit(:timeout)
    end
  end

I’m sure I’m missing something here. I recorded a quick video of what that looks in practice:
https://starter-demo.wearebold.af/v/w56ve

Chris, I’m an idiot. It was indeed the image size. I tried all kinds of ways to debug and work around this (good way to learn FLAME by the way) and then it hit me that 2.1GB actually does sound pretty hefty to me. Turns out, I had some older, locally uploaded files inside the priv directory and didn’t have that docker ignored. Dang! That was a costly lesson… Thanks for the pointer and sorry about that! :see_no_evil: