Having trouble getting initial app deployment to succeed, hangs at "Monitoring deployment"

BigglesZX · December 22, 2022, 10:04am

Hey folks,

I’m migrating a Django app from Heroku and am having issues getting an initial successful deployment. My deploys are hanging at the “Monitoring deployment” stage before eventually failing after several minutes. The only way I’ve managed to complete the process is by using --strategy immediate, but the app doesn’t run once deployed in this way.

This is my fly.toml (I’ve commented out the release_command to try to isolate the problem):

app = "<myapp>-web"
kill_signal = "SIGTERM"
kill_timeout = 30

[env]
  DJANGO_SETTINGS_MODULE = "<myapp>.config.production.settings"

[deploy]
#   release_command = "django-admin migrate --noinput"

[[services]]
  internal_port = 8080
  protocol = "tcp"

  [services.concurrency]
    hard_limit = 25
    soft_limit = 20
    type = "requests"

  [[services.ports]]
    force_https = true
    handlers = ["http"]
    port = "80"

  [[services.ports]]
    handlers = ["tls", "http"]
    port = "443"

  [[services.http_checks]]
    grace_period = "30s"
    interval = "15s"
    method = "get"
    path = "/"
    protocol = "http"
    restart_limit = 0
    timeout = "2s"
    tls_skip_verify = false

Note that I added the http_checks section later because I was wondering if having no health checks meant my app could never be healthy. However that doesn’t seem to have made any difference.

This is my Dockerfile:

# syntax=docker/dockerfile:1

FROM python:3.8-slim as base

# Set up environment
ENV LANG=C.UTF-8
ENV LC_ALL=C.UTF-8
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONFAULTHANDLER=1
ENV PYTHONUNBUFFERED=1
ENV PIPENV_VENV_IN_PROJECT=1

# Start a new build stage to build dependencies
FROM base AS python-deps

# Install pipenv and compilation dependencies, including those for mysqlclient and pillow
RUN pip install pipenv
RUN apt-get update && apt-get install -y --no-install-recommends gcc build-essential python3-dev default-libmysqlclient-dev zlib1g-dev libjpeg62-turbo-dev libopenjp2-7-dev libtiff5-dev libwebp-dev libfreetype6 libfreetype6-dev
RUN apt-get clean

# Install python dependencies in /.venv
COPY Pipfile .
COPY Pipfile.lock .
RUN pipenv install --deploy

# Start a new build stage to receive the result of dependency installation
FROM base AS runtime

# Create and switch to a new user
RUN useradd --create-home --user-group appuser
WORKDIR /home/appuser
USER appuser

# Copy virtual env from python-deps stage
COPY --from=python-deps --chown=appuser:appuser /.venv /.venv

# Configure PATH to include venv bin directory, and PYTHONPATH to include app directory
# This mimics our previous use of `python setup.py develop`
ENV PATH="/.venv/bin:$PATH"
ENV PYTHONPATH="/home/appuser:$PYTHONPATH"

# Copy application into container
COPY . .

# Collect static files
RUN django-admin collectstatic --noinput --settings <myapp>.config.production.settings

# Expose port
EXPOSE 8080

# Run gunicorn
# note for readers – this was previously an ipv4-only binding
CMD ["gunicorn", "<myapp>.wsgi:application", "--bind", ":8080", "--workers", "4", "--access-logfile", "-", "--error-logfile", "-"]

The deploy process stops at the monitoring step for several minutes before failing:

==> Monitoring deployment
Logs: https://fly.io/apps/climb-bandits-api-web/monitoring

 1 desired, 1 placed, 0 healthy, 1 unhealthy
--> v7 failed - Failed due to unhealthy allocations - not rolling back to stable job version 7 as current job has same specification and deploying as v8

--> Troubleshooting guide at https://fly.io/docs/getting-started/troubleshooting/
Error abort

I’ve tried building and running the Dockerfile locally and it seems to work fine. Additionally fly doctor reports no issues. Despite the release being marked as “unhealthy” there are no checks listed in fly checks:

Health Checks for <myapp>-web
  NAME | STATUS | ALLOCATION | REGION | TYPE | LAST UPDATED | OUTPUT
-------*--------*------------*--------*------*--------------*---------

Most maddeningly, I don’t get any logs in the dashboard nor through fly logs.

What am I doing wrong? Thanks!

BigglesZX · December 22, 2022, 2:10pm

Quick update, as I learn more fly commands. Here is the output from fly vm status for the most recent instance of the app:

Instance
  ID            = 1d045f7f
  Process       = app
  Version       = 26
  Region        = lhr
  Desired       = run
  Status        = pending
  Health Checks =
  Restarts      = 0
  Created       = 1m22s ago

Events
TIMESTAMP           	TYPE      	MESSAGE
2022-12-22T14:06:39Z	Received  	Task received by client
2022-12-22T14:06:39Z	Task Setup	Building Task Directory
2022-12-22T14:06:43Z	Template  	Missing: vault.read(apps/data/555851/870811)

Checks
ID	SERVICE	STATE	OUTPUT

Recent Logs

I’ve tried ensuring that gunicorn now binds on ipv6 as well as ipv4, and I’ve added a TCP health check to the config, but can’t get any different results and – crucially – no log output anywhere.

jerome · December 22, 2022, 3:22pm

It looks like, somehow, your secret didn’t make it to our database. The scheduler thinks it needs to fetch a secret but it can’t find it.

I have removed the non-existent one (that’s a weird thing to say) and it looks like the deploy went through.

Did you set 2 secrets? I see you faced an error trying to deploy an upstash redis instance, I wonder if that added a secret and then rolled back and got your app in this weird state.

BigglesZX · December 22, 2022, 4:50pm

Hey @jerome, thanks for the quick reply. AFAIK I’ve only ever had the DATABASE_URL secret, so I assume this ghost secret arose from the issues with provisioning Upstash Redis yesterday.

Unfortunately I still can’t seem to successfully deploy from this end – despite commenting out both of my custom health checks and reducing the number of gunicorn workers to 1 to try to avoid possible OOM errors. Running vm status on my latest vm suggests the “missing: vault.read” error relating to secrets is still occurring.

I still can’t see any other logs from deployment – any insights from your side?

Thanks again.

BigglesZX · December 23, 2022, 3:37pm

Just tried another deployment having left it for a day – seems OK now thanks for your help with this!

Topic		Replies	Views
Deployment Debugging Questions / Help django	5	629	January 6, 2023
Django app stuck on "Running release task (pending)" Django	3	350	February 4, 2023
Deployment hanging at "cleanup" stage Questions / Help	4	534	November 20, 2022
Deploy Django - Failed: timeout reached waiting for machine's state to change Build debugging django	1	337	January 21, 2024
Deployment not happening and no error message Build debugging	0	350	December 13, 2022

Having trouble getting initial app deployment to succeed, hangs at "Monitoring deployment"

Related topics