Hey folks. We made a little change to our API that should reduce the number of intermittent
502 errors you’ve been seeing lately. If you’re still encountering them, please let us know!
Here’s some details if you’re curious…
The Fly API is made up of several apps – a regular 'old Rails app and a few go services – all running on Fly. (In fact, our stack is almost entirely normal Fly apps running right alongside yours. Turtles all the way down!) In front of these apps is an haproxy cluster at api.fly.io which proxies requests to the right backend.
*.fly.dev addresses for backends which caused connections to go through
ingress -> fly-proxy -> haproxy -> fly-proxy -> rails/etc
This extra layer isn’t great for performance and occasionally confused haproxy into closing in-flight connections. This is where the EOF errors come from.
After today, haproxy is connecting directly to backends over the private network (
app-name.internal), which skips the second
fly-proxy hop. So far metrics look promising and hopefully EOFs are under control.