One of my apps both machines just stopped

fkrauthan · January 15, 2025, 5:37am

Hey,

For one of my apps I run two machines (in two different datacenters) and both of them just stopped for a 5hour period (till I manually restarted them).

Based on the memory and CPU graphs nothing out of the ordinary happened before the shutdown.

The app is also configured with active healthchecks which I assumed should have restarted the instance in case of an crash. Can someone from the fly team provide some more insight? The app in question is cfps-cors-proxy.

fkrauthan · January 17, 2025, 3:19am

Just following up so this does not get closed

andrewmcgrath · January 17, 2025, 7:35am

Thats scary…hopefully someone can provide some insight.

wjordan · January 17, 2025, 8:01am

Hi @fkrauthan,

Your application seems to be crashing, quite often. By default, Machines will automatically restart if they crash, but only up to the configured limit (the default restart policy is to retry 10 times within a 5-minute interval), and your machines were stopped without restarting when they crashed repeatedly too quickly. (You will find a machine has reached its max restart count of 10 line in your application logs.) I’d suggest debugging your application, but if frequent crashing is okay for your use-case, you can either adjust the restart policy, or configure autostart on your services so that machines will always attempt to be started to serve any incoming request.

fkrauthan · January 17, 2025, 4:55pm

Hi @wjordan ,

Thanks for looking into it, but where exactly do you see this crashes? For some reason the UI for my app seem to be bugged as the Health check changes indicate 100 change(s) during the past 48 hours while when I look at the actual list forst of all it always seem to be only one of the two nodes (which is already strange in of itself) and second there are max of 13 events for the last 2 days.

Also searching for Running CORS proxy on (the first log after service start) doesn’t show any excessive restarts (still higher then it should). But the service is a very simple nodejs http server and I don’t see any logs indicating why it is suppose to be crashing.

But I will look a bit more into that. But would be great if the UI could be updated because of that I never noticed that there might be an actual issue.

wjordan · January 17, 2025, 5:12pm

In your application logs over the time period mentioned, you will find hundreds of crashes and restarts, you can use the Search Logs feature (see docs) to look through them.

fkrauthan · January 17, 2025, 5:59pm

Ah interesting, I just saw that. I found the issue (pesky security scanning robots)… Thanks for looking into it. It just felt strange since the UI was a bit inconsistent.

I think the issue is resolved.

system · January 24, 2025, 5:59pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Machine restart loop instead of Wake-on-Request Questions / Help machines	9	1073	March 27, 2023
NodeJS suddenly stopped and can't start back up. JavaScript nodejs	14	321	September 1, 2023
Unexpected Restarts metrics	3	753	September 17, 2020
Instance or service not restarted when I expected it to Questions / Help	5	1150	July 26, 2022
App restarting every 6 minutes	2	124	May 27, 2024

One of my apps both machines just stopped

Related topics