Cloudflare + Fly for better performance

I recently split tested a few configurations against each other and found one interesting result. I tested my fly deployment directly against the same app but proxied through Cloudflare. Surprisingly, CF + FLY performs better on all my core metrics for response times.

The total number of requests in this test was 30k

Fly alone
Avg: 571.8ms
Median: 216ms
95p: 1800ms

CF + Fly
Avg: 489.5ms
Median: 197ms
95p: 1500ms

I made sure to turn off caching on CF to make the test fair. I have no clear idea what the reasons are for this other than Cloudflare having some connection/routing magic to the users that Fly doesn’t have yet. On CF side, I did enable HTTP/3 QUIC, which might make the difference.

Any ideas?

1 Like

Just an idea (I really don’t know much about this), but maybe it’s because the connection between the client and Cloudflare and then Cloudflare and fly is better than the connection between the client and Cloudflare. So perhaps Cloudflare is closer to the client and then Cloudflare just had a great connection to fly?

1 Like

That would be my guess too.

They’ve optimized routing enormously over the years. They also have more locations. Transit between datacenters is usually much faster than between users and datacenters. I also believe we share some of the same datacenters with them, so they can probably reach our servers with just a few hops.

Of course, we’re adding more locations as time goes on.

We generally don’t recommend adding more layers (such a Cloudflare) to your stack due to the added complexity. Figuring out where the problem is happening can be hard with too many layers. However, CF is pretty reliable, this is fine if you’re exposing an HTTP service and you don’t mind having your DNS hosted with them.

1 Like

Do we really need CloudFlare. The main advantages of Cloudflare is DNS hiding and DDoS mitigation and cache. How much of this does Fly handle natively ? I too am on the fence about adding an extra dns hop through CloudFlare.

We do network level DDoS mitigation, which is enough for most apps. We don’t do HTTP level DDoS mitigation, so targeted attacks that “look like” normal traffic will still hit your app. High risk apps probably need HTTP level DDoS protection, either from someone like Cloud Flare or with another layer on Fly.

You will probably be fine without Cloud Flare. They terminate TLS in more locations than we do. If you have heavy usage from regions where we don’t have a presence we’ll be slower.

One other reason I can think of for using Cloudflare is the ability to set the s-maxage cache header for a shared cache to avoid the origin server having to serve up the same thing for multiple users. Unless I’m mistaken, Fly doesn’t support this header itself.

1 Like

It’s true! If you want HTTP caching putting a CDN in front is helpful.

It’s trivial to setup a caching proxy on Fly. This Fly blog post discussed the issue in detail. It’s worth looking at other details on whether this would make sense for you.

Costs

Cloudflare’s cheapest plan costs $20/mo, including a number of features beyond HTTP caching (WAF, workers, image optimization, etc). Running Varnish in all 19 Fly regions on the smallest VM size would cost $37/mo. Most apps don’t need so many regions, so you might consider the cost difference here to be negligible.

As mentioned above, however, you would need to host your DNS with Cloudflare, or upgrade to their $200/mo plan to allow CNAME-based access. Fastly bottoms out at $50/mo but supports CNAME setups.

Support

But a hidden cost in both cases is the extra support surface area. Over the years I’ve had problems with both of these providers, seeing phantom 500 errors, among other things, that are hard to debug in their black box setups. In one case, an incorrect setting of the SNI hostname took about 1 month to figure out across the Fastly and Netlify support teams. This would have been trivial to debug given full access to the caching nodes.

Support for low-end customers on CDNs can also take days to respond effectively to an issue. Given that your entire app must be proxied through them, it’s worth considering this point. Why Fly would not necessarily be supporting your node configuration directly, the underlying infrastructure support gives you a lot of coverage.

Cache hit ratios

In the article above. optimizing cache hit ratios across multiple cache nodes is listed as a problem that’s solved by CDNs out of the box.

Consider that this may not be necessary if your origin can handle a few extra uncached hits. Say you have 3 nodes in 3 regions, and you expire a single page. That means 3 uncached origin fetches, maximum, as that page gets served up again across the regions. Maybe your app is fine with this! Many apps would be fine with just two nodes for redundancy in case one falls over.

Cache expiration

The last part not covered anywhere is distributed cache expiration. There are various approaches for this out there, but I’d consider this a necessary part of any HTTP cache beyond a simple asset cache.

We’re working on a small tool that can be deployed alongside Varnish or Nginx to assist with distributed cache expiration. Once it’s ready, we’ll post it here to try out in front of your apps!

Control

If you like having Git-based control over your whole stack, this is a good starting point. Also, learning the caching and request manipulation tools present in software like Nginx or Varnish can greatly enhance your skill set!

5 Likes

Great arguments. I think my biggest issue is that I’m a UI developer and this part of the stack is where I’m least comfortable so having something that is more or less managed for me with a nice UI is quite compelling despite some of the trade-offs mentioned :sweat_smile:

2 Likes

Yeah, fair enough :laughing: Trying to replicate everything a CDN does would be quite a feat, and not 100% doable only with open source tools today.

That said, I believe there could be a comfy middle road that would build on this powerful OSS software in the open, with a functional UI/API, to cover the majority of the use cases that people care about.

One can dream…

2 Likes

Sounds like a fun project. :slight_smile:

1 Like

I’m also using Cloudflare and Fly.io together in a personal project, and I’m loving it, but my setup is slightly different.

Regarding @michael1’s original numbers, it’s pretty amazing the Cloudflare proxy can sometimes save time (with caching disabled) even though it adds a network hop. That said, I’ve definitely seen it do the opposite (add latency), as it sounds like @michael1 originally expected it would. Maybe this means the rest of my post is stating the obvious, but I was surprised by some of the details.

First of all, Cloudflare’s DNS service is faster than almost anything else out there (typically <15ms vs. 50-100ms+ for other providers), so CF is my DNS provider.

However, I’ve decided not to let CF proxy traffic to my Fly containers (instead the DNS record goes straight to the Fly IP address), in part because I don’t particularly need CF’s proxy/caching, but also because of geography/geometry: it’s just really hard to improve on the ~30ms it takes me to ping the closest Fly data center (this latency varies by user location relative to Fly data centers, of course), even if the closest CF data center is closer to me than the Fly one. Lucky for me, those data centers happen to be located in the same city, but the extra hop through CF is still noticeable.

My general understanding is that, unless the CF proxy is located literally in the same data center, adding another hop along the way can only add geographical/network distance to the already great round-trip times you see when connecting directly to the Fly container—assuming Cloudflare ends up talking to the Fly container, rather than returning a cached response. That assumption isn’t exactly fair to Cloudflare, since caching is one of the main ways CF makes up for the extra latency their proxy adds, but it’s important here.

In case you’re wondering, I first discovered this performance surprise by “pausing” Cloudflare on my site, and then noticing my latencies dropped back to ~30ms (at best), from 50-80ms with CF in the loop. Note: these numbers are from memory, so please don’t quote me!

I’ve created a few subdomains that I let Cloudflare handle with edge workers, but the main domain is hosted directly by Fly. Ironically, I’m now using Fly to cache requests to these CF-powered subdomains, because my Fly containers can maintain in-memory/on-disk caches, whereas CF workers don’t live very long, so in-memory caching is less feasible/useful there.

With all of that said, if your Fly containers are swamped/slow and their responses can be effectively HTTP-cached, I would expect putting Cloudflare in front of Fly to speed things up and reduce load on the origin server, but (for better or worse) I don’t have those problems yet.

On a personal note, this is my first post in these forums, so I hope this has been fun/interesting content for folks to read. I see a few familiar avatars—:wave: @kentcdodds! Happy flying, everyone.

6 Likes

Oh wow that’s a great post. :raised_hands:

In general, I think avoiding extra layers is good. But TLS handshakes might be faster if you run through a CDN with a lot more locations.

CDN <-> Fly can get weird. We’ve seen Akamai send all traffic through Tokyo for a site, for example. They’re not really built to sit in front of an Anycast IP.

1 Like

This is what I’m thinking too. But I will also wait and see whether they’re necessary.

Nice to see you too!

1 Like

Disabled Cloudflare DNS Proxy and response times decreased from 500ms → 50ms. Just found out how to make a Server Rendered App feel like an SPA.

I’m in India and app is hosted on MAA.

1 Like

I have been experimenting with Cloudflare + Fly for similar reasons to those discussed previously - primarily for their WAF & bot detection.

Going down this route still had the issue of exposing my service publicly via the fly.dev domain associated with it. Originally I had added a middleware to my app to verify a shared secret. I then set this header inside a Cloudflare worker when proxying to my origin which blocked access to the application but didn’t feel like a great solution.

I later found out about Cloudflare tunnels which basically reverses the traditional CDN → origin request lifecycle by maintaining a persistent tunnel from origin → CDN.

The recommended approach for this is to run the cloudflared daemon in a sidecar container, although it felt a bit over-the-top to run a separate Fly app just for this daemon.

I ended up using s6-overlay to run my app + cloudflared in the same container (a bad idea according to some, I’ve not had any problems with it though :grimacing:).

If you want to take a look at how I’ve configured this, I’ve created an example at GitHub - jamesmbourne/fly-cloudflared. You’ll just need to configure a Cloudflare tunnel and then export these as env vars/secrets in Fly as defined in the docker-compose.yml.

With this approach, you can remove the services.ports section from your fly.toml so that it is no longer publicly accessible, but should still be reachable via the Cloudflare tunnel.

I’m thinking about writing a more in-depth blog post on this - let me know if you’d be interested!

3 Likes