Fly replay resulting in an error

I’m getting the following error after attempting to replay a request from ams to my primary in den.

error.message="could not send HTTP request to instance: http2 error: stream error received: unspecific protocol error detected" 2023-01-30T16:05:52Z proxy[5f0b30a4] sjc [error]request.method="POST" request.url="https://kentcdodds.com/calls/record/new?_data=routes%2Fcalls.record%2Fnew" request.id="01GR1PTYHGSR2F0NKFBVAHWY6A-sjc" response.status=502 
error.message="could not send HTTP request to instance: error from user's HttpBody stream: error reading a body from connection: stream error detected: unspecific protocol error detected" 2023-01-30T16:05:57Z proxy[5f0b30a4] den [error]request.method="POST" request.id="01GR1PTYHGSR2F0NKFBVAHWY6A-sjc" 

My own logs before this show that my application in ams was sending the replay:

2023-01-30T16:05:52Z app[92d83e6f] ams [info]Replaying: {
2023-01-30T16:05:52Z app[92d83e6f] ams [info]  pathname: '/calls/record/new',
2023-01-30T16:05:52Z app[92d83e6f] ams [info]  method: 'POST',
2023-01-30T16:05:52Z app[92d83e6f] ams [info]  currentInstance: '92d83e6f',
2023-01-30T16:05:52Z app[92d83e6f] ams [info]  currentIsPrimary: false,
2023-01-30T16:05:52Z app[92d83e6f] ams [info]  primaryInstance: '5f0b30a4'
2023-01-30T16:05:52Z app[92d83e6f] ams [info]}

I’m uncertain why sjc is mentioned in those logs because I don’t have a region in sjc.

This is a POST of application/x-www-form-urlencoded;charset=UTF-8 with a content-length of 1139895 (quite large). So maybe that has something to do with it? Specifically, it’s the audio recorder upload for Call Kent Podcast

You can find the source code for the replay here: kentcdodds.com/fly.ts at 96d76de72a4a48089f2eb22a88a6ad1c6f847fa1 · kentcdodds/kentcdodds.com · GitHub (that’s an express middleware that’s applied pretty early in my middleware: kentcdodds.com/index.ts at 96d76de72a4a48089f2eb22a88a6ad1c6f847fa1 · kentcdodds/kentcdodds.com · GitHub).

Any tips and advice is welcome.

This could be a transient error. I’m not entirely sure. Does it happen consistently?

Request IDs include the edge region where we accepted the user’s connection.

That shouldn’t be a problem.

No, it doesn’t happen consistently, though it did happen more than three times in a row to one of my users :grimacing: He was very patient about it.

Looks like it happens when the form body is extra large. I tried with just about 10 seconds of recording (101210 content-length) and it worked fine. 41 seconds (381077) and it 502ed on me with these errors:

2023-01-30T18:22:38Z proxy[5f0b30a4] sjc [error]could not send HTTP request to instance: http2 error: stream error received: unspecific protocol error detected
2023-01-30T18:22:43Z proxy[5f0b30a4] den [error]could not send HTTP request to instance: error from user's HttpBody stream: error reading a body from connection: stream error detected: unspecific protocol error detected

I tried with 20 seconds (191006) and it also 502ed:

2023-01-30T18:23:50Z proxy[5f0b30a4] sjc [error]could not send HTTP request to instance: http2 error: stream error received: unspecific protocol error detected
2023-01-30T18:23:55Z proxy[5f0b30a4] den [error]could not send HTTP request to instance: error from user's HttpBody stream: error reading a body from connection: stream error detected: unspecific protocol error detected

So I tried with 10 seconds again and that worked fine.

So then I tried with 15 seconds (149047) and that 502ed:

2023-01-30T18:25:47Z proxy[5f0b30a4] sjc [error]could not send HTTP request to instance: http2 error: stream error received: unspecific protocol error detected
2023-01-30T18:25:52Z proxy[5f0b30a4] den [error]could not send HTTP request to instance: error from user's HttpBody stream: error reading a body from connection: stream error detected: unspecific protocol error detected

12 seconds (115651) worked.

I tried 17 seconds (171403) and surprisingly it worked that time.

So I tried 30 seconds (286134) and it failed again with the same errors.

So it appears somewhere above 115651 it’s hit or miss and as far as my tests go it always fails above 286134.

Also, a few times during these tests, I got a 502 and this error:

2023-01-30T18:32:43Z proxy[5f0b30a4] den [error]could not send HTTP request to instance: connection error: timed out

I just read this:

Attempting to replay requests larger than 1MB will throw an error.

So that probably explains the issue. I’ll have to come up with a workaround I suppose :slightly_frowning_face: