App was down for 15 mins (returned 403 Forbidden)

One of my apps in AMS region was down today for 15 minutes according to external monitoring. Metrics and logs don’t show anything wrong. Both .fly.dev and main domain via cloudflare returned 403

2022-07-21
17:38-17:55 GMT

Response Headers * {

  • “via” => “1.1 fly.io”,
  • “date” => “Thu, 21 Jul 2022 17:38:05 GMT”,
  • “server” => “Fly/50de8a7b (2022-07-11)”,
  • “x-runtime” => “0.001160”,
  • “content-type” => “text/plain”,
  • “x-request-id” => “bda3e085-e418-49b3-8e49-33c9a8cb4079”,
  • “cache-control” => “no-cache”,
  • “fly-request-id” => “01G8GXD2VWQSQ7DFKFH5HZD6Y3-sin”,
  • “transfer-encoding” => “chunked”
  • }

Nope! AMS is humming along just fine.

Our infrastructure does not generate 403 errors, though. Based on that, plus the x-runtime and x-request-id headers, I’m going to say that the error came from somewhere else.

Thanks kurt.

Will keep investigating. Weird case for a simple health check endpoint

Turns out it was Rack Attack gem in Rails App that blocked all traffic for 5 minutes 3 times in a row. It counted all requests on a single IP (I assume it’s FLY’s proxy) which triggered its throttling rule