Odd number of restarts in single region

OldhamMade · April 12, 2023, 8:05am

Hi, I’m currently seeing a significant increase in the number of requests to my app, and have scaled my app to 50 nodes to handle the throughput.

While scaling, I noticed the following:

ID          REGION   HEALTH CHECKS                 RESTARTS    CREATED
c23a0196    mia      running 2 total, 2 passing    30          11h56m ago
67276b05    mia      running 2 total, 2 passing    20          11h56m ago
430508b3    bos      running 2 total, 2 passing    0           11h45m ago
ccc4233e    sin      running 2 total, 2 passing    0           11h45m ago
17368be9    fra      running 2 total, 2 passing    0           11h45m ago
...[a number of other instances all with 0 restarts]...

It seems nodes in the mia region are restarting way too often. This is a true micro-service, there is no state held at each node, so I don’t think it would be something application-related: the deployment should be identical to the other nodes, which are all running without any restarts.

Is this a known issue in the mia region?

kaz · April 12, 2023, 5:07pm

Can you check fly status --all and fly logs? Seems your app’s healthcheck was failing multiple times.

OldhamMade · April 12, 2023, 5:11pm

Thanks for replying Kaz. I’ve deployed since then, not sure the status will match now. I’ll keep an eye on things and resurrect this thread if it happens again.

OldhamMade · April 13, 2023, 5:21am

Hi Kaz

It happened again yesterday, only at the mia location, instance 1285de7f. I checked the logs but couldn’t really see anything.

The main concern is that, as I mentioned, this is a true micro-service. There is no state held at each node, there is no database, there aren’t even any cookies. Every single instance is the same, not only because it is the same docker image but from a runtime perspective too. The only difference I can see is that the mia region gets more requests overall than other US regions.

tj1 · April 13, 2023, 6:22am

Nothing in the logs at all? If it’s load, it could be due to memory / OOM killer.

OldhamMade · April 13, 2023, 6:32am

No, nothing that would indicate a reason for restarting. I’m going to keep an eye on it today to try and “catch it in the act”.

I don’t think it is an OOM issue, according to the metrics memory is pretty stable. Also, this only happens in the mia region.

lillian · April 13, 2023, 10:40am

Looks like your instance 1285de7f was restarted because a health check failed. Unfortunately we don’t have much visibility into why health checks failed in the past , but looking at your app’s logs I see a lot of messages from our proxy:

could not make HTTP request to instance: connection closed before message completed

and

could not make HTTP request to instance: connection error: timed out

along with health check failures matching the timestamps of when the mia instance was restarted.

The proxy errors are from many regions, not just mia, but you did mention that you get more traffic in that region - is it possible your instances are failing to respond to some HTTP requests at high load?

system · April 20, 2023, 10:40am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Unexpected Restarts metrics	3	683	September 17, 2020
App was down for 20 minutes in the middle of the night and then restarted. How to investigate?	2	354	January 6, 2022
App stop responding after sometime. Works on app restart for a while.	5	260	August 28, 2023
Instance or service not restarted when I expected it to Questions / Help	5	1023	July 26, 2022
Trying to grok why my apps in sea are now running in france and india? Questions / Help	10	431	April 5, 2022

Odd number of restarts in single region

Related Topics