Serverless Cold Start Causes Nginx 502 Bad Gateway (FastAPI + Nginx + Docker)

Hey everyone,

I’m running a FastAPI project on Fly.io, deployed as a microservice using Docker. I’m taking advantage of Fly.io’s serverless-like behavior: when there’s no incoming traffic for a while, the machine shuts down, and it spins up again when a request comes in. This is great for reducing costs — but I’m facing an issue with cold starts.

I currently have 2 machines in my Fly.io app. When there’s no traffic for some time, both go to sleep. The issue occurs when a request comes in while both machines are cold.

Here’s what happens:

  • A request hits the app → machines start spinning up.
  • For about 2–3 seconds, I consistently get 502 Bad Gateway (nginx) errors.
  • After that, FastAPI starts responding correctly.

So the backend eventually comes online, but nginx seems to start before FastAPI is ready. During those first few seconds, it’s probably trying to proxy requests to port 8000, but uvicorn isn’t listening yet.

Dockerfile:

FROM python:3.11-slim AS backend

WORKDIR /app
COPY ./requirements.txt /app/requirements.txt
RUN pip install --no-cache-dir --upgrade -r /app/requirements.txt
COPY . /app

FROM nginx:latest AS frontend
COPY ./nginx/nginx.conf /etc/nginx/nginx.conf
RUN ln -sf /usr/share/zoneinfo/Europe/Istanbul /etc/localtime

COPY --from=backend /usr/local /usr/local
COPY --from=backend /app /app

COPY start.sh /start.sh
RUN chmod +x /start.sh

EXPOSE 8080

CMD ["/start.sh"]

start.sh

#!/bin/bash

cd /app

nginx -g 'daemon off;' &

exec uvicorn main:app --host 0.0.0.0 --port 8000

proxy_next_upstream and related directives may be helpful. Per the documentation, passing a request to the next server can be limited by the number of tries and by time.

You can reduce (and possibly all but eliminate) this window by changing auto-stop in your fly.toml from off to suspend. At the moment, this is limited to machines with a max of 2GB of RAM, and in some cases applications don’t take well to clock skew issues when woken up. But should it work for you, your application will be back up and running almost instantly.

2 Likes

Thank you very much for your quick response. Solving this on the nginx side using proxy_next_upstream makes a lot of sense.

Wouldn’t changing the auto_suspend setting from off to suspend affect the cost? I assumed it would likely incur higher charges.

As an alternative, I solved the issue by updating my start.sh script as follows:

#!/bin/bash

cd /app

uvicorn main:app --host 0.0.0.0 --port 8000 &
FASTAPI_PID=$!

echo "Waiting for FastAPI to start..."
MAX_RETRIES=15 
RETRY_COUNT=0

until curl -s http://127.0.0.1:8000/health > /dev/null 2>&1; do
  RETRY_COUNT=$((RETRY_COUNT+1))
  if [ $RETRY_COUNT -ge $MAX_RETRIES ]; then
    echo "Timed out waiting for FastAPI after 15 seconds. Starting nginx anyway."
    break
  fi
  echo "Waiting for FastAPI... ($RETRY_COUNT/$MAX_RETRIES)"
  sleep 1
done

if [ $RETRY_COUNT -lt $MAX_RETRIES ]; then
  echo "FastAPI is up and running"
fi

# Start nginx in the foreground
echo "Starting nginx..."
exec nginx -g 'daemon off;'


1 Like

At the moment, there is no cost for using suspend. And while I obviously can’t say that will never change, I can say that I’m unaware of any plans to change that. It is something we encourage people to use, the primary reason why it is not the default is that it can cause problems with some applications that are sensitive to clock issues.

I very much like you start script, that solves the problem nicely. That hadn’t occurred to me. I’ll try to remember to point others with similar issues at your solution.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.