Traffic not being routed to some regions

Wayrunner · August 18, 2022, 9:47am

We have just started using fly for one of our apps. The site seems stable but every now and then traffic stops being routed to one region.

sum by (region) (rate(fly_app_http_responses_count[$__interval]))

The traffic seems to be automatically routed to other regions so the only effect on users is a slower response time. However this is not ideal especially if we are paying for the machine in that region.
When looking at the status of the app everything looks fine:

Deployment Status
  ID          = ...
  Version     = v272
  Status      = successful
  Description = Deployment completed successfully
  Instances   = 3 desired, 3 placed, 3 healthy, 0 unhealthy

Instances
ID      	PROCESS	VERSION	REGION	DESIRED	STATUS 	HEALTH CHECKS     	RESTARTS	CREATED
827d081b	app    	272    	lhr   	run    	running	2 total, 2 passing	0       	1h54m ago
ea577808	app    	272    	syd   	run    	running	2 total, 2 passing	0       	1h55m ago
b545f3a1	app    	272    	lax   	run    	running	2 total, 2 passing	0       	1h56m ago

Can someone explain to me how the routing works? I can find very little information about it in the docs.

eli · August 18, 2022, 12:46pm

Since it looks like you have 1 vm per-region, this behavior could also be caused by a problem with individual instances. At first glance, I’m not seeing any any errors with your app’s traffic as it passes through our infra.

You might have already done this, but you can check for restarts with fly status --all and investigate instance logs with fly logs -i. This information might help you narrow things down further!

Wayrunner · August 18, 2022, 2:13pm

Thanks for looking into this.
I can’t see anything wrong in the logs and there was also no restarts for any of the instances.

What are the circumstances that would lead to no more traffic being routed to a region?

eli · August 18, 2022, 3:16pm

What are the circumstances that would lead to no more traffic being routed to a region?

Quite a few! Since it sounds like your instances are healthy, you may want to check if your have a hard_limit defined. If this value is exceeded, then that instance would no longer accept traffic. With one instance per-region, this would effectively re-route traffic to a different region.

This should show up in your app’s logs, though. You could rule this out by deploying a second instance in the region where you aren’t seeing any traffic.

Wayrunner · August 19, 2022, 2:06pm

Thanks this is helpful.

I have seen some hard limits being hit but I was assuming that the hard_limit would just throttle the number of requests going to that dyno and not stop sending any traffic to it.

Btw. we have switched from balanced autoscaling to standard and since then I have not seen the issue again.

Topic		Replies	Views
Autoscaling: is there a way to see how many instances are running in each region? Questions / Help	19	1090	June 21, 2021
I would like to understand more about regions Questions / Help	3	1113	November 29, 2022
How to specify regions to run in?	19	1890	November 5, 2021
Downscaled apps keep restarting semi-regularly Questions / Help	4	392	September 24, 2023
Deploying in all regions and directing users based on location Questions / Help distributed	6	1563	June 18, 2022

Traffic not being routed to some regions

Related topics