Machine Check Failure Alerts

Dear Fly community - we’re fairly new to fly.io. We’ve successfully deployed several apps and machines.

The http service section of our fly.toml file for all of our apps has the follow:

[[http_service.checks]]
    interval = '30s'
    timeout = '5s'
    grace_period = '10s'
    method = 'GET'
    path = '/app-status'

Our containers are using supervisord to run an Nginx reverse proxy in front of a Node.js app (please - no lectures about a single process per container, we have lots of good reasons for this configuration and it’s worked well for us for years :wink:).

If the Node.js app crashed, the Nginx proxy will stay up - but the service check above will fail (our logs will also likely contain an upstream proxy failure (502 or 503) errors.

My question is, how can we be notified if a machine check is failing?

We’ve yet to fully understand our options in terms of logging and notifications, but at a minimum we’d like to know if the machine check above is failing.

Thoughts or suggestions greatly appreciated.

When a fly app crashes, it usually auto restarts. What is your nginx config? It should gracefully handle proxying to services that were restarted.

Hi @khuezy - thanks for the reply. I guess the problem then in our case is that our app still appears to be ‘up’ - since Nginx is still handling requests - even though the machine check is failing? I would have thought a failed check would be enough to auto-restart the app? Again - Nginx and Node.js are part of the same Docker container (and therefore Fly app).

Can I ask what the need for nginx is if it’s just a nodejs server on the same machine?

We use Nginx for short-ttl caching (the popular page scenario), security, redirects, custom rate limiting locations / routes and more. We’ve always put Nginx in front of our Node.js apps (even if rate limiting is off on multiple app instances). For larger projects we’d separate Nginx into a dedicated reverse proxy, but for smaller full-stack deploys, we create app instance containers that run Nginx/Node.js via supervisord.

I’d still recommend breaking the nginx server out to its own app to take advantage of the Fly proxy. If not, then this sounds like an nginx config issue if it’s unable to reconnect to the local node process right?

I appreciate the replies - but we’re not taking Nginx out of our standard Docker setup for full-stack apps. We have quite a bit of experience with this configuration, and it works well for us. The issue I believe is that a machine check failing on its own is not enough to restart the app at Fly.io - I think they’re looking for a failed app as well (non-zero exit, or stderr, or missing process 0 etc.) in addition to the failed check. I might be wrong, but we’ll look at it more closely. Ngnix is doing the right thing and returning an upstream proxy error - but it’s still running, and so as far as Fly.io is concerned the app is still up.

Then perhaps the problem is supervisord not restarting your nodejs app when it crashes.

1 Like

Oh my gosh you’re a star. That’s spot on. autorestart was set to false. Thank you!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.