Machine Check Failure Alerts

58bits · November 3, 2024, 1:33pm

Dear Fly community - we’re fairly new to fly.io. We’ve successfully deployed several apps and machines.

The http service section of our fly.toml file for all of our apps has the follow:

[[http_service.checks]]
    interval = '30s'
    timeout = '5s'
    grace_period = '10s'
    method = 'GET'
    path = '/app-status'

Our containers are using supervisord to run an Nginx reverse proxy in front of a Node.js app (please - no lectures about a single process per container, we have lots of good reasons for this configuration and it’s worked well for us for years ).

If the Node.js app crashed, the Nginx proxy will stay up - but the service check above will fail (our logs will also likely contain an upstream proxy failure (502 or 503) errors.

My question is, how can we be notified if a machine check is failing?

We’ve yet to fully understand our options in terms of logging and notifications, but at a minimum we’d like to know if the machine check above is failing.

Thoughts or suggestions greatly appreciated.

khuezy · November 3, 2024, 2:47pm

When a fly app crashes, it usually auto restarts. What is your nginx config? It should gracefully handle proxying to services that were restarted.

58bits · November 3, 2024, 6:44pm

Hi @khuezy - thanks for the reply. I guess the problem then in our case is that our app still appears to be ‘up’ - since Nginx is still handling requests - even though the machine check is failing? I would have thought a failed check would be enough to auto-restart the app? Again - Nginx and Node.js are part of the same Docker container (and therefore Fly app).

khuezy · November 3, 2024, 6:56pm

Can I ask what the need for nginx is if it’s just a nodejs server on the same machine?

58bits · November 4, 2024, 2:05am

We use Nginx for short-ttl caching (the popular page scenario), security, redirects, custom rate limiting locations / routes and more. We’ve always put Nginx in front of our Node.js apps (even if rate limiting is off on multiple app instances). For larger projects we’d separate Nginx into a dedicated reverse proxy, but for smaller full-stack deploys, we create app instance containers that run Nginx/Node.js via supervisord.

khuezy · November 4, 2024, 2:23am

I’d still recommend breaking the nginx server out to its own app to take advantage of the Fly proxy. If not, then this sounds like an nginx config issue if it’s unable to reconnect to the local node process right?

58bits · November 4, 2024, 2:46am

I appreciate the replies - but we’re not taking Nginx out of our standard Docker setup for full-stack apps. We have quite a bit of experience with this configuration, and it works well for us. The issue I believe is that a machine check failing on its own is not enough to restart the app at Fly.io - I think they’re looking for a failed app as well (non-zero exit, or stderr, or missing process 0 etc.) in addition to the failed check. I might be wrong, but we’ll look at it more closely. Ngnix is doing the right thing and returning an upstream proxy error - but it’s still running, and so as far as Fly.io is concerned the app is still up.

khuezy · November 4, 2024, 3:03pm

Then perhaps the problem is supervisord not restarting your nodejs app when it crashes.

58bits · November 4, 2024, 3:18pm

Oh my gosh you’re a star. That’s spot on. autorestart was set to false. Thank you!

system · November 11, 2024, 3:19pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Health checks on Machines Questions / Help wishlist	4	1167	May 4, 2024
Fly app keeps failing the health check	6	692	June 15, 2023
can't find how to configure machine restart / new machine creation policy on health check fail	1	204	August 7, 2023
HTTP Health checks failing, but not restarting app	5	993	July 25, 2023
502 with `auto_stop_machines = true`	3	184	August 10, 2023

Machine Check Failure Alerts

Related topics