Assets 404 on deploy

I have a problem that I’m hoping others here have a solution to.

When I deploy my app (rolling), users will eventually hit a new server that says to serve “asset-2.js”. The request is then sent to the backend, but lands on an old server. The old server only has “asset-1.js”, so it 404s the request.

Ideally, I would like the user to always receive an asset if any server has it.

What I’ve Done

I thought maybe [[statics]] would help here. I am sure that it’s properly setup because I see the fly cache-hit information on the response. That would never come from my app, so I assume it’s the static server.

When I do the deploy, though, I still receive the 404. It’s almost like the static is not available until after the app server is live.

If it’s possible that I didn’t setup statics properly, I would love to understand how I can check that. The guest_path reflects a real path available in my container.

Different Ideas

In the past, I solved this problems on a Rails app by publishing all assets to S3 on deploy. Then, I had a CDN in front of the bucket that pulled from S3. Because the assets were pushed before deploy, there is a 100% success rate. But, it’s quite a few moving parts and I’d like a simpler solution if possible.

1 Like

Rely on http caching instead of changing names, if you’d like the old gen server to not 404?

A common way to bust caches is to append a query string: asset.js?oldhex1 / asset.js?newhex2. In your case, if the request for newhex2 reaches old gen, then it is left for it to decide whether to serve a 404 or not.

Usually, one would not use explicit random hexes in query strings but instead rely on ETags (mdn), which almost all CDNs and blob stores support, along with Cache-Control directives.

Speaking of CDNs, I’d not use [[statics]] if I was looking for a http cache compliant delivery: Does Fly serve `Cache-Control` origin responses off of its edge cdn? - #2 by jsierles

Yea echoing Ignoramus you will need to do something at the phoenix/plug router level with fly-replay.

One suggestion was to do append a subsub-domain to assets like [vmid]. assets.app.com and then use the Fly Replay to tell fly to retry the request on the correct machine id. This should be fairly achievable via a plug and url helper.

We are currently moving our system to Apps v2 which is currently in pre-release which should deploy much faster to ameliorate some of these problems.

1 Like

[statics] are pretty basic, this is one of the things they don’t account for. But the feature seems valuable. We’re going to look at making statics much better sometime this year.

Falling back to previous assets is important, it’s on the list. That doesn’t help you today, but it’s definitely magic we think we can do. :slight_smile:

This is really interesting. I don’t think that I know the correct machine ID, but elsewhere=true should in theory always hit a response.

edit: I see now that you’re suggesting to include that on the request. That’s interesting, but it may damage the ability to use a CDN. I’ll have to think through this a bit more.

What happens to fly-replay if the request chains together? So a 404 happens, fly-replay elsewhere=true is returned, then that continues 3 times until it finds the server. Will Fly cap-out on the number of requests? Will it try all machines and exclude future machines from the request?

I think [statics] seems really valuable for this (and static edge hosting). I’d love to see it built out to handle this case.

1 Like

What I’m suggesting is that each server puts its appid/version in the url’s it generates for them. so

<img src={"/images/seemsgood.png?appid=#{System.get_env("FLY_ALLOC_ID")}&vsn=#{MY_VSN}" /> 

(do this in a smarter way)

And then add a Plug before Plug.Static to check if those query params exist, if the current vsn matches let it go through normally, if the vsn doesn’t match halt with the Fly Reply header with the appid.

1 Like

I did it more naively to avoid query params, but I just tested and it appears to work. Still too early to tell though.

With this approach, it naively sends fly-replay: elsewhere=true if a 404 occurs. Fly will bounce this around the cluster until it’s found (200), or all servers exhausted (502).

I don’t mind the 502s because that only happens to non-existing assets, but I could do something like “if server is older than 10 minutes, don’t send the fly-replay header”. This would give me 404s back

This plays well with CDN and Fly Statics as well, and maintains cache-ability between deploys that don’t change the asset.

I tested this by creating an asset manually from SSH console, on a single server. That file is returned 100% of the time when I request it.

# endpoint.ex

  plug Plug.Static,
    at: "/",
    from: :super,
    gzip: true,
    only: ~w(assets fonts images favicon.ico robots.txt)

  plug SuperWeb.Plug.AssetNotFound, at: "/", only: ~w(assets fonts images favicon.ico robots.txt)
defmodule SuperWeb.Plug.AssetNotFound do
  import Plug.Conn
  alias Plug.Conn

  def init(opts) do
    %{
      at: Keyword.fetch!(opts, :at) |> Plug.Router.Utils.split(),
      only: {Keyword.fetch!(opts, :only), []}
    }
  end

  def call(conn = %Conn{}, %{at: at, only: only_rules}) do
    segments = subset(at, conn.path_info)
    asset_path? = allowed?(only_rules, segments)

    if asset_path? do
      conn
      |> put_resp_header("fly-replay", "elsewhere=true")
      |> send_resp(404, "Asset not found")
      |> halt()
    else
      conn
    end
  end

  # Taken from Plug.Static
  defp subset([h | expected], [h | actual]), do: subset(expected, actual)
  defp subset([], actual), do: actual
  defp subset(_, _), do: []

  # Taken from Plug.Static, `:only` option needs put into this tuple format even though not used now
  defp allowed?(_only_rules, []), do: false
  defp allowed?({[], []}, _list), do: true

  defp allowed?({full, prefix}, [h | _]) do
    h in full or (prefix != [] and match?({0, _}, :binary.match(h, prefix)))
  end
end
2 Likes

That looks good to me! Especially since there is only a relatively small window where this could happen.