IDS/IPS, WAFs, and Fly.io

From time to time we receive inbound questions asking us what Intrusion Detection/Prevention Systems or Web Application Firewalls we have deployed. These questions are usually from someone’s auditor or a prospective customer and, knowing that, we feel a twinge of sympathy because the answer to “What IDS/IPS/WAF is used by Fly.io?” that they’ll have to carry back to the auditor/prospect is “None”.

However, there’s a good reason for “None”: we use something better-suited to the task and, if you find yourself being asked the same questions, this is us giving you full permission (nay, encouraging you) to hand what we say here over to your auditors as explanation so you don’t have to worry about it.

A bit of of exposition: ostensibly, Intrusion Detection Systems detect and warn of intrusions, while Intrusion Prevention Systems detect, warn of, AND prevent intrusions. What sort of monster wouldn’t want to detect or prevent intrusions? Without going into the effectiveness of IDS/IPS (that’s a load-bearing “ostensibly” back there), generally how they operate is by:

  1. Inspecting network traffic
  2. Building a signature/profile of “normal” traffic so as to be able to spot abnormal traffic

Neither operation makes sense on Fly.io.

For the first bullet, all traffic between your machines running on our workers is over an encrypted mesh of point-to-point WireGuard connections. So it’s not like there’s a huge middlebox ingesting and categorizing all internal customer traffic (thankfully). We don’t want to look into your traffic, nor do we want the massive security surface area increase (and risk) that would come from feeding arbitrary attacker controlled data to Enterprise Software running at the core of our network. No thank you.

For the second bullet, we’re a public cloud. Anyone with a credit card and a few minutes can speedrun deploying any manner of solution on Fly.io. What does “normal” look like when any Firecracker micro-vm running on any worker in our fleet could drastically change CPU load, listening ports, network traffic volume/sources/destinations from minute to minute? Yes, we have control backplanes on our edge and compute workers, but the backplanes are also evolving as our fleet evolves. An IDS looking for a signal at the network level would either spew false-positives ad infinitum or never alert as there would never be a stable baseline to look for deviations from.

But this isn’t to say we don’t take steps to detect malicious operations: we just don’t do it at the level that IDS/IPS operate. Rather, we observe and alert on behavior further down, using eBPF-based solutions to look for anomalous behavior at the kernel level on the hosts that make up our fleet.

An attacker with root on a Firecracker micro VM running on one of our workers is our EXPECTED scenario. An IDS wouldn’t catch that attacker trying to exploit the newest horrible processor microcode bug or fuzzing our GPU stack. But instrumenting the OS kernel at the level below Firecracker will spot these things.

The above language is for you, dear reader, who has a curiosity about the technical details of why we choose not to rely on IDS/IPS, but might be a bit much for your auditors or prospective customers. To make life simpler for everyone, we give you this:

Fly.io does not rely on a traditional IDS or IPS solution, as those solutions operate primarily at the network level and thus are not useful for detecting or preventing the vast majority of potential attacks against the platform. Rather, Fly.io relies on third-party and custom eBPF tooling to instrument the runtime kernel on hosts to detect and alert on anomalous and potentially malicious behavior.”

If that doesn’t work for you, let us know and we’ll help you out!

An aside about WAFs and Shared Responsibility

Web Application Firewalls are kindasorta in the same conceptual space as IDS/IPS, but WAFs protect web applications, as the name would suggest. Fly.io isn’t in the web application business, so a WAF wouldn’t really get us much. But that doesn’t mean YOU, can’t or shouldn’t use a WAF if you want to or your auditor or customer or whomever is asking for it. To that end, Arcjet and Wafris are two such WAFs that folks have deployed with their applications running on Fly.io.

More broadly, our responsibility is the protection of the systems and infrastructure that your applications run on, but responsibility for protection of your application is left up to you. Some folks call this a “Shared Responsibility Model”, and we’ll have more to say on that down the line.

9 Likes

This all sounds quite reasonable, thanks for writing it up! However, just like Fly partners with Tigris, Supabase, and Upstash to offer some great managed services, it would be great if Fly also found a CDN partner who could provide a managed WAF, DDoS protection, HTTP caching, and other common CDN features, within Fly’s infrastructure.

3 Likes

Not even by offering something like Coraza as a premium handler?
e.g. handlers = ["tls", "http", "coraza"]

1 Like