Statics no longer working

Up until today, statics were being served properly for my app. However, today, everything my site tries to serve via statics is 404ing. I’m not sure if I did something wrong, though I definitely didn’t touch fly.toml or Dockerfile.

Relevant part of fly.toml:

[[statics]]
  guest_path = "/app/static"
  url_prefix = "/static"

And of Dockerfile:

WORKDIR /app

# <snipped installation of supercronic>

COPY poetry.lock pyproject.toml /app/

RUN pip3 install poetry
RUN poetry install --no-root

COPY . .

# snipped DJANGO_SETTINGS_MODULE and DJANGO_SECRET_KEY

RUN poetry run python _manage.py collectstatic --noinput

When I request https://pdhdata.com/static/bs/bootstrap.min.css, here are the response headers:

HTTP/2 404 Not Found
content-type: text/html; charset=utf-8
x-frame-options: DENY
content-encoding: gzip
x-content-type-options: nosniff
referrer-policy: same-origin
cross-origin-opener-policy: same-origin
server: Fly/1e93abda (2022-12-08)
fly-cache-status: MISS
date: Tue, 13 Dec 2022 11:15:24 GMT
via: 2 fly.io
fly-request-id: 01GM5K4M1DMPPP3VNEQTH9XDQ1-iad
X-Firefox-Spdy: h2

Lastly, I used flyctl ssh console to log in. Checking /app/static, I see all the expected files.

% flyctl ssh console
Connecting to fdaa:0:dae4:a7b:93:8bc0:e630:2... complete
# cd app/static/bs
# ls
bootstrap.bundle.min.js      bootstrap.min.css
bootstrap.bundle.min.js.map  bootstrap.min.css.map

I’m not sure what to do or how to troubleshoot. I did try a no-op redeploy of my app, but that didn’t help.

After much keyboard mashing and wild guessing, I discovered flyctl config display. It in turn showed me JSON like this:

// ...
    "statics": [
        {
            "cache_key": "_static__app_static",
            "guest_path": "/app/static",
            "processes": [],
            "url_prefix": "/static"
        }
    ]
}

Since I never specified cache_key, I guess _static__app_static was system-generated. Groping blindly in the dark, I added a cache-busting cache_key with today’s date. A redeploy later and I’m serving statics again.

I still suspect there’s a system bug here, but am personally back in action.

Oh, I just thought to check another Fly app, and it too is no longer serving statics. It hasn’t been touched in weeks, so now I’m suspecting a change on the platform side.

Hi. We’re also getting 404 for all Fly-managed static files across multiple orgs, multiple apps.

This has been happening on and off, Monday and today.

Sometimes we’re able to fix it by re-building, and more recently not.

Please help!

Example Fly.toml file #1

app = "..."

kill_signal = "SIGINT"
kill_timeout = 5

[deploy]
  release_command = "make release"
  strategy = "rolling"

[env]
  PORT = "8080"

[[services]]
  internal_port = 8080
  protocol = "tcp"

  [services.concurrency]
    hard_limit = 25
    soft_limit = 20

  [[services.ports]]
    handlers = ["http"]
    port = "80"
    force_https = true

  [[services.ports]]
    handlers = ["tls", "http"]
    port = "443"
    
  [[services.http_checks]]
    grace_period = "30s"
    interval = 10000
    method = "get"
    path = "/"
    protocol = "http"
    timeout = 3000
    restart_limit = 6
    tls_skip_verify = true
    [services.http_checks.headers]

[[statics]]
  guest_path = "/app/static"
  url_prefix = "/static"

Example Fly.toml file #2

app = "..."

kill_signal = "SIGINT"
kill_timeout = 5

[deploy]
  release_command = "make release"
  strategy = "rolling"

[env]
  PORT = "8080"
  BASE_URL = "https://alpha.hotosm.org"

[[services]]
  internal_port = 8080
  protocol = "tcp"

  [services.concurrency]
    hard_limit = 25
    soft_limit = 20

  [[services.ports]]
    handlers = ["http"]
    port = 80
    force_https = true

  [[services.ports]]
    handlers = ["tls", "http"]
    port = 443

  [[services.tcp_checks]]
    grace_period = "30s"
    interval = "30s"
    restart_limit = 10
    timeout = "10s"

[[statics]]
  guest_path = "/app/static"
  url_prefix = "/static"

We found an issue with our statics garbage collector that would make it too aggressive under certain conditions.

We’re issuing a fix and looking at restoring statics that were deleted.

3 Likes

@jerome I believe I’m having the same issue:

Also having the same issue, with a Django app.
Thought it was some issue on my end, but the app has been deployed for over 15 days without issue until today.

Hope a fix will come soon.

Apologies for the delay, the fix is being rolled out now.

2 Likes

Looks like it’s fixed now.

1 Like

Yeah this has fixed for me - thanks for getting on it!

1 Like

Kicking on the issue from my colleague @janbaykara.

This is still an issue across our entire suite of sites. There is now a daily stream of support requests from clients around this issue.

Usually an adequate fix is to redeploy with a minor change (tweak a README for example) and re-deploy. But could we get an update on this as soon as possible please @jerome? The majority of the sites are paid, so can go through email channels if this is more appropriate.

Thanks so much in advance.

Also ccing in @dusty - thanks for your hard work on this.

We’re looking into this again this morning.

1 Like

@alex-ck where are your apps deployed? I think they’re all in London, correct?

@jerome That’s right, lhr.

@alex-ck We found the issue and have rolled back a change so this should not happen again.

2 Likes

Thanks. Really appreciated.

Happening again right now, at least int the AMS region.
Probably related to deployments also failing there today.

1 Like

Confirming that is seems to be only AMS region.
Tested deploying to MAD or CDG and static serving works.

1 Like

Broken in IAD again for me :frowning: