Cacheable fly-replay, and better subdomain routing

One of the platform features we love the most at Fly is fly-replay. It allows you to basically re-program our edge proxy to dynamically reroute HTTP requests however you’d like. For example, it is often used to implement subdomain routing for app-per-customer products, such as per-user dev environments, where one “router” app receives all requests targeted at *.example.com and distributes them to apps assigned to specific users.

There is just one slight problem with this model — it’s kind of expensive! And that’s not just for you, our customer, because the router app needs to effectively handle all requests served by your product, in addition to the target apps themselves; it’s expensive for us as well because every replay needs to propagate backwards to the source edge, and then be forwarded to the correct app / machine. In most cases, the replay target doesn’t really change, at least not for a couple of minutes; sending these requests to the original router app is simply unnecessary work for both our proxy and your app. It creates unnecessary latency as well when the replay target is far from your router app.

To solve this, we settled on implementing a simple caching layer on top of fly-replay. This is completely optional and not used at all by default. However, if you think it will help with your app, here’s how to use it:

  1. Send your fly-replays as normal from the origin “router” app.
    a. Note that the state field is not supported; nor is fly-replay-src.
  2. In addition to fly-replay, send back the following headers:
    a. fly-replay-cache: example.com/some/path/*: Specifies where we should apply the cached fly-replay. This example rule would cause the cached fly-replay to be used on all paths under example.com/some/path/.
    b. fly-replay-cache-ttl-secs: 30: Specifies how long to cache the fly-replay header for, in seconds. This must be larger than 10 seconds since smaller values are usually not helpful.

(For more details on these headers, refer to our documentation)

Note that the proxy does not synchronize this cache proactively, so if requests reach us from, for example, a different region, you might still see a couple more requests for the same path even if you have sent a cache header before. However, each proxy instance should keep the cache around for at least the specified number of seconds, and cached replays will be passed up the proxy chain. In our testing, we’ve seen dramatic reduction in requests that had to be handled by the router app, sometimes up to 80% - 90%, even for comparatively low TTL settings.

Though, with every cache comes the problem of cache invalidation. If replay targets do eventually change for your app, you either need deal with setting extremely short TTLs depending on how fast you’d like the change to take effect, or you’d need to preemptively invalidate the cache you have previously set. This can be done by

  • Sending another fly-replay to the “router” app when any of your target apps receive a “misfired” request
  • Sending fly-replay-cache: invalidate along with it.

Even with cache in place, though, you might still want to run multiple instances for the “router” app, for reliability reasons. As mentioned above, the proxy does not attempt to proactively synchronize the cache across the fleet. The source of truth is still business logic in your router app, and you still need it to be online to serve all requests in a reliable manner. The cache doesn’t absolve the router app from reliability constraints; it does allow you to focus more on acquiring more users for multitenant products, instead of worrying about scaling up a router app for unnecessary load.

11 Likes

Hello, I tried using this feature by replaying requests to an echo endpoint, but I observed that the fly-replay-src header always exists and the timestamp keeps changing. I suspect that the cache is not actually working.

 https://boluo-site-staging[REMOVE THIS].fly.dev/api/info/echo

What is the header you’re returning from this path? The fly-replay-cache pattern needs to be a wildcard pattern that matches the current URI. For example, if the current URI is

example.com/some/path/

Then the only valid fly-replay-cache patterns would be

example.com/*
example.com/some/*
example.com/some/path/*

Right now there isn’t a way to specify an exact match, so for example example.com/some/path/echo wouldn’t work – you’ll need to replay example.com/some/path/* instead.

This can definitely use a bit more error logging on the user side, though.

After printing the log, I found that the reason was a wrong hostname (somehow request.nextUrl.hostname == "0.0.0.0"). It was fixed after switching to getting the host directly from the header. However, I hope it would be better to omit the host part:

fly-replay-cache: /api/*