Deploy failing due to push failing

We have a 10GB layer from poetry install in our docker image

Its installing python packages for AI usage. It seems to be restarting at around 3-4GB of data pushed.

Is there a way to let the full 10GB get pushed?

“You are doing it wrong”, I would say. Even though image size has been increased from 2G->8G (ref Docker image size limit raised from 2GB to 8GB)

Do not load models on build stage, download those to persistent volume (as models probably do not change often) and use those from there.

There’s actually no models loaded.
It’s just the nvidia cublas portion of the pip install torch.

I isolated it by installing each package individually and saw this is the big layer:
RUN poetry add nvidia-cublas-cu11==

pip install torch==2.0.0

Can you share your Dockerfile, your fly.toml, and the output of the fly command you’re running with LOG_LEVEL=debug?

Sure, I need to do it in a few hours since I got passed it in a hacky way by pip installing our poetry project and forcing pip to install the cpu only version of pytorch with:
RUN pip install torch torchvision torchaudio --index-url It’s a bit hacky though as now poetry doesn’t have it automatically installed. This reduced the layer down to 1.2GB

So that was the fix to reduce image size. I’ll grab the old Dockerfile in a few hours and get prior version of those files and outputs.

I believe I am hitting the same issue, anywhere from 3.5gb to 3.8gb the layer push retries. Also using pytorch for machine learning purposes.

--> Building image done
==> Pushing image to fly
The push refers to repository []
0b7921a299c9: Pushed
ef6492d8b2c5: Pushed
b03f84644f36: Pushed
ea2cbc668e9d: Pushing  7.181GB/7.181GB
38b7afa4b510: Pushed
fe055d693f15: Pushed
45edac8e009c: Pushed
d82a965980ed: Pushed
9364cfd0203d: Pushed
b9044eea833a: Pushed
a2d7501dfb35: Pushed
Error: failed to fetch an image or build from source: error rendering push status stream: unknown: unknown error

it in fact did not push the entire 7.181gb

maybe I can try the hacky solution referred above


# For more information, please refer to
FROM python:3.10-slim


# Keeps Python from generating .pyc files in the container

# Turns off buffering for easier container logging

# Install system dependencies for building rhino3dm
RUN apt-get update && apt-get install -y \
    build-essential \
    cmake \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

# Install pip requirements
COPY requirements.txt .
RUN python -m pip install -r requirements.txt

COPY . /app

# Creates a non-root user with an explicit UID and adds permission to access the /app folder
# For more info, please refer to
RUN adduser -u 5678 --disabled-password --gecos "" appuser && chown -R appuser /app
USER appuser

# During debugging, this entry point will be overridden. For more information, please refer to
CMD ["gunicorn", "--bind", "", "server:app"]


# fly.toml app configuration file generated for graphtestrun on 2023-09-10T09:34:34-07:00
# See for information about how to use this file.

app = "graphtestrun"
primary_region = "sea"


  internal_port = 5002
  force_https = true
  auto_stop_machines = false
  auto_start_machines = true
  min_machines_running = 0
  processes = ["app"]
    type = "requests"
    soft_limit = 200
    hard_limit = 250

for me, the hacky solution also seems to have worked

I edited my dockerfile with these lines, as I didn’t need the other libraries only torch:

# Install pip requirements
COPY requirements.txt .
RUN python -m pip install torch --index-url
RUN python -m pip install -r requirements.txt

I also had to run:

flyctl scale memory 2048 -a graphtestrun

within the venv in vscode