Sudden decrease in throughput, no recent changes

OldhamMade · September 19, 2022, 7:07am

Hey @fly, I’ve noticed a sudden decrease in throughput on my app. Screenshots below:

(Please ignore the tooltip on this one, it isn’t relevant)

I’m concerned because this app has been running for a long time without any changes. I’ve not deployed an update for quite a while, I haven’t changed any of the scaling settings, and I believe that my account is paid in-full.

It would be great if someone at Fly.io could take a look and let me know if anything changed internally that requires some changes from my side to compensate.

Thanks!

jerome · September 19, 2022, 11:50am

This could be a new bug in our proxy. I’m restarting it everywhere. I think this will help if the bug is what I think it is.

Looks like all traffic is going to 2 instances (our of 20).

OldhamMade · September 19, 2022, 12:14pm

I think this will help if the bug is what I think it is.

Yep, it looks like throughput is back up to expected levels.

Is this something that I can expect to experience again in the future, or more of a one-time issue related to a deploy of new proxy code, @jerome?

jerome · September 19, 2022, 12:23pm

It might happen again before we fix it (once we’ve found the root cause), but of course you shouldn’t expect issues like these to happen. This was not intended.

OldhamMade · September 19, 2022, 12:43pm

Cool, I’ll reply on this thread and @ you (and anyone else you think should be alerted) if it happens again.

Thanks for the speedy fix!

OldhamMade · September 29, 2022, 7:49am

Hey @jerome it has happened again. My app is down to around 15% of normal throughput. Speedy assistance would be appreciated!

jerome · September 29, 2022, 11:51am

Looking into a more permanent fix this time. We have to keep the current issue ongoing while we investigate though!

OldhamMade · September 29, 2022, 12:06pm

Thanks for the time you’re spending looking into this! Much appreciated.

amos · September 29, 2022, 3:49pm

I’ve looked at this today and I deployed a change that I hope will resolve the issue. Right now things are looking nice and healthy again, but since this bug takes a couple days to show up, don’t hesitate to ping us again if/when it does!

OldhamMade · September 29, 2022, 5:19pm

Hey @amos, @jerome, looks like the fix hasn’t worked, throughput has plummeted again!

amos · September 29, 2022, 5:34pm

Looking again, thanks for the ping!

jerome · September 29, 2022, 8:08pm

We pushed out another fix earlier and are monitoring the situation. Do ping us if we doze off!

OldhamMade · October 21, 2022, 11:03am

Hey @amos & @Jerome, looks like we’re getting some odd behaviour again.

Last 24hrs:

And last 6hrs:

To clarify, there have not been any new deploys in the last month, at least, and we’re running a good number of instances.

Any insight you can provide would be wonderful!

wjordan · October 21, 2022, 4:09pm

The gaps in metrics are explained by recent incidents with the metrics cluster:

Topic		Replies	Views
Something went wrong? Questions / Help	42	1435	September 22, 2022
Global outage (maybe already recovering) just now? proxy	5	129	December 19, 2024
App suddenly really slow Questions / Help lhr , proxy	18	179	September 23, 2024
Something not right on Fly.io	35	1906	March 4, 2023
Fly apps experience bizarrely slow responses	3	444	November 24, 2023

Sudden decrease in throughput, no recent changes

Related topics