Routing layer visibility

jsierles · April 27, 2021, 4:55pm

One issue with platforms like Heroku is the opaque routing layer. You get a few general metrics from it, but lack visibility into or control over things like:

chosen routing algorithm (random afaik)
websocket connections (only logged when a connection completes)
headers (not loggable)
time series metrics (none available generally)

It would be interesting to know if we could get insight into any or all of these in the future. One header that does get passed on a few platforms is X-Request-Start to help understand if requests are queued at before the VM layer. Anyway, just thought I’d raise this topic and see what your thoughts are on routing and visibility.

jerome · April 27, 2021, 5:05pm

We’ve been thinking about logging HTTP information per-request. We don’t even do that right now to save on logging.

We mostly have 2 algorithms, they’re not chosen dynamically though. We do “power of 2 random choices” out of healthy, nearby, instances.

I’d be happy to add logging for this. What are you looking for specifically?

If this was to happen, you’d like to specify which ones to log or do you want all of them? For some apps, thats would mean tens of thousands of logs per second.

We do offer prometheus time series for our proxy: Metrics on Fly.io · Fly Docs

X-Request-Start is interesting. I’d be happy to add that. This might be encoded in our request IDs too however I’m not sure if it’s monotonic or since epoch. I’ll have to check.

I love posts like this! There’s a lot of quick things we can add that can make your life easier.

One thing I was going to add soon is: logging error statuses produced by our proxy. Mostly 502s, but we also produce some 503s. We have reasons for most of these and should expose that to our users, via logs.

jsierles · April 27, 2021, 5:19pm

Fair enough. I think ‘off by default’ is reasonable but should be possible to enable.

I’d want to be able to specify this, but it’s not that important. The use case I’ve had in the past is debugging requests that never make it to the VMs (for whatever reason) or are rejected by app-level rate limiting.

Nice! Will check that out.

Cool - that can work. Using a common default would make things ‘just work’ with many APM services like ScoutAPM out of the box: Scout APM Documentation

Good to know - will make some more from my PaaS wish/complaint list

This makes sense. One thing that I think Heroku did well was assign an error code to these different conditions so they can be tracked independently in log analyzers.

jsierles · April 27, 2021, 5:21pm

A few things I missed. For logging websockets, it would be helpful to see both start and end, since these requests can live for a long time.

Also it would be interesting to see open websocket count in the time series metrics, if possible.

kurt · April 27, 2021, 11:10pm

Websocket info in the time series metrics is a good idea. Right now you end up seeing weird 95th percentile metrics on HTTP requests from apps that use websockets. Splitting those out could be really nice.

jerome · May 3, 2021, 8:48pm

@jsierles I’ve deployed a change, if you upgrade flyctl, showing proxy errors (502s and some 503s) and their reasons (not all documented yet) in app logs.

Hopefully this helps everybody understand a bit more when errors occur. Usually it’s because the connection was abruptly severed between us and the app.

I plan on adding X-Request-Start soon. That’s a much smaller feature, sounds useful too.

jerome · May 7, 2021, 3:11pm

I’ve added X-Request-Start: t=<microseconds since epoch> .

jsierles · May 7, 2021, 4:00pm

Nice! I’ll test it out with ScoutAPM.

Topic		Replies	Views
Request: Analytics based on Region	1	296	May 24, 2021
more information about http requests on an app? Questions / Help wishlist , metrics , logs , proxy	3	25	April 29, 2025
Keep Up The Good Work!	2	546	May 14, 2021
Weird traceroute through Fly.io reverse NGINX proxy --- really slow... Questions / Help	7	714	April 26, 2021
Migrating from Heroku to Fly	3	1151	December 15, 2020

Routing layer visibility

Related topics