Slow S3 uploads for files larger than ~30KB

Hi there,

I’ve been prototyping an app with Fly.io using node. I have an endpoint where users can POST a file to be uploaded into a s3 bucket.

When trying this out locally, upload latencies were reasonable in the sub ~100ms range. However, the same service when deployed to Fly results in uploads of over 3 seconds for files that are larger than ~30KB. For smaller files the latencies are within the same ballpark as local dev, which makes this even more puzzling.

I’ve tried scaling up to 2x dedicated CPU but that resulted in no changes to the upload latencies. The S3 bucket is located in us-west-1 and my Fly app is in sjc, so that should rule out physical distance as a cause as well.

Any ideas what could be the issue? Happy to provide more details over DM to help with the investigation (my prototype doesn’t have auth at the moment haha…).

Thanks!

Wow that’s a pain. Where are you connecting from to get to your sjc app? You can see this at https://debug.fly.dev

This could just be some weird interaction between our proxy and your app. You can test this by connecting to your app over 6PN and testing upload speed. You’ll need to make sure your app listens on :: (instead of 0.0.0.0) to be available over the private network.

Once you have that listening, you can create a wireguard connection with fly wg create, connect, and then do something like this to hit your app:

curl "http://[<private ip>]:8080/path/to/url"

You can get the private IP with fly ips private. Also, make sure the URL includes the port your app is listening on.

Really appreciate the super quick response Kurt!

Gave this a try just now, and hitting the private ip + port through wireguard does indeed bring the upload latency for >30KB files down to the same ballpark as when running locally (~100-200ms).

I guess that confirms your hypothesis of this being a proxy issue? Let me know what else I can do to help dig in further.

EDIT: Oh and I’m connecting from sjc according to the link you sent.

Quick update: It looks like the issue has disappeared overnight! Latencies seem to be back to normal now.

Well that’s unexpected. Let us know if it comes back up? We’ve been adding some instrumentation to see if we can detect any kind of slowdowns.

Will do :+1: