Hey folks. We made a little change to our API that should reduce the number of intermittent EOF
, 503
, and 502
errors you’ve been seeing lately. If you’re still encountering them, please let us know!
Here’s some details if you’re curious…
The Fly API is made up of several apps – a regular 'old Rails app and a few go services – all running on Fly. (In fact, our stack is almost entirely normal Fly apps running right alongside yours. Turtles all the way down!) In front of these apps is an haproxy cluster at api.fly.io which proxies requests to the right backend.
haproxy used *.fly.dev
addresses for backends which caused connections to go through fly-proxy
twice:
ingress -> fly-proxy -> haproxy -> fly-proxy -> rails/etc
This extra layer isn’t great for performance and occasionally confused haproxy into closing in-flight connections. This is where the EOF errors come from.
After today, haproxy is connecting directly to backends over the private network (app-name.internal
), which skips the second fly-proxy
hop. So far metrics look promising and hopefully EOFs are under control.