Cacheable fly-replay, and better subdomain routing

One of the platform features we love the most at Fly is fly-replay. It allows you to basically re-program our edge proxy to dynamically reroute HTTP requests however you’d like. For example, it is often used to implement subdomain routing for app-per-customer products, such as per-user dev environments, where one “router” app receives all requests targeted at *.example.com and distributes them to apps assigned to specific users.

There is just one slight problem with this model — it’s kind of expensive! And that’s not just for you, our customer, because the router app needs to effectively handle all requests served by your product, in addition to the target apps themselves; it’s expensive for us as well because every replay needs to propagate backwards to the source edge, and then be forwarded to the correct app / machine. In most cases, the replay target doesn’t really change, at least not for a couple of minutes; sending these requests to the original router app is simply unnecessary work for both our proxy and your app. It creates unnecessary latency as well when the replay target is far from your router app.

To solve this, we settled on implementing a simple caching layer on top of fly-replay. This is completely optional and not used at all by default. However, if you think it will help with your app, here’s how to use it:

  1. Send your fly-replays as normal from the origin “router” app.
    a. Note that the state field is not supported; nor is fly-replay-src.
  2. In addition to fly-replay, send back the following headers:
    a. fly-replay-cache: example.com/some/path/*: Specifies where we should apply the cached fly-replay. This example rule would cause the cached fly-replay to be used on all paths under example.com/some/path/.
    b. fly-replay-cache-ttl-secs: 30: Specifies how long to cache the fly-replay header for, in seconds. This must be larger than 10 seconds since smaller values are usually not helpful.

(For more details on these headers, refer to our documentation)

Note that the proxy does not synchronize this cache proactively, so if requests reach us from, for example, a different region, you might still see a couple more requests for the same path even if you have sent a cache header before. However, each proxy instance should keep the cache around for at least the specified number of seconds, and cached replays will be passed up the proxy chain. In our testing, we’ve seen dramatic reduction in requests that had to be handled by the router app, sometimes up to 80% - 90%, even for comparatively low TTL settings.

Though, with every cache comes the problem of cache invalidation. If replay targets do eventually change for your app, you either need deal with setting extremely short TTLs depending on how fast you’d like the change to take effect, or you’d need to preemptively invalidate the cache you have previously set. This can be done by

  • Sending another fly-replay to the “router” app when any of your target apps receive a “misfired” request
  • Sending fly-replay-cache: invalidate along with it.

Even with cache in place, though, you might still want to run multiple instances for the “router” app, for reliability reasons. As mentioned above, the proxy does not attempt to proactively synchronize the cache across the fleet. The source of truth is still business logic in your router app, and you still need it to be online to serve all requests in a reliable manner. The cache doesn’t absolve the router app from reliability constraints; it does allow you to focus more on acquiring more users for multitenant products, instead of worrying about scaling up a router app for unnecessary load.

10 Likes