proxy error.message=["Undocumented"]

Hello Fly!

I’m getting some (concerning?) error messages rolling through my fly logs:

2021-09-08T17:29:59.342616396Z proxy[628be8a1] chi [error] error.code=1 error.message="Undocumented" request.method="GET" request.url="/socket/websocket?[PARAMS REDACTED]" response.status=502

My app lives in ewr right now but I’m getting the errors in chi and dfw, which is where most of my customers are located.

Is this a config issue? Websockets? Thanks for the helping hand.

Lanny

Hmm… so I’ll just rubber duck myself on this one. :slight_smile:

My fix (for the moment) appeared to be to adjust my concurrency limit in fly.toml. My Fly dashboard showed me maxed out of the copy-pasted hard limit of 25 due to persisting websocket connections, and I think it’s possible the Fly routing layer wasn’t letting anyone else to my app server at that limit.

For the Fly folks, if that’s true I think it would be helpful to document the potential consequences of this (especially given how much Phoenix folks love their websockets :grinning:) in the fly.toml Docs.

Normally if you’re hitting concurrency limits, you’ll see a message saying “Hard limit reached” in the docs. The undocumented error might be related to it being a websocket connection, though. If it errors after the upgrade that could be confusing our error handling.

We should definitely expand the Phoenix docs with notes about websockets, that’s a good idea.

Kurt, thanks for replying!

My production app is a pretty small business, so I feel like I’m unlikely to really run into (non-self-imposed) ceilings here. And, I don’t want to foot-gun myself again on this.

I can see some potential outs here:

  • Actually set up autoscaling so if I do hit the limit that traffic will have somewhere to go
  • Remove the limits altogether and let a container being out of memory (??) be my limit
  • Set a limit that’s kind of preposterously high (for my customer base at the moment) and have autoscaling be my last resort in case Elon mentions me on Twitter or something :wink:

Do you have a recommendation there? Or at least some metrics to watch as I plot next moves?

I’d got for option 3. Even small Elixir VMs can handle ~500 concurrent connections, so setting the hard limit to 500 or so seems safe. Preposterously high would be like 2500.

Autoscaling is slow to start Elixir nodes so it probably won’t help much right now.

1 Like

What’s the difference in behavior between soft cap and hard cap? Is it something like soft cap = spin up a new instance, hard cap = start rejecting requests?

Yes that’s it exactly. Scaling is metrics based, so it’s a little lagged, but if your VMs average more than the soft limit it’ll scale up. So you could set a lowish soft cap, then a high hard cap to give it a buffer while it scales.

“undocumented” errors are just “undocumented” :confused: . There are a many scenarios where this can happen.

Can you provide the request.id of that log line you pasted? I might be able to dig into it and find out what the actual error was and perhaps document it :slight_smile:

Replied via DM

For future reference, request IDs are (intentionally) safe to share. :slight_smile:

1 Like

Hah, fair. Thanks again, Kurt!