We have some memory issues and while debugging I learned about the dmesg --ctime
command. (Thanks @qqwy). The outputs of this command shows crashes that I haven’t received emails about. My hypothesis is that these notifications are only sent at most 1 times a day. Is this true? I looked at my mailbox and I’ve never gotten more than 1 such emails per day.
That’s correct. Just checked the code to confirm. We only send this at most once per day.
I can tweak that to send it at most once per day per VM size (so if you do change your VM size, you’ll get a new email if it still OOMs). Or maybe every hour?
Some apps will OOM a lot, that’s why we put in a limit at first.
Honestly I’m not sure what I want. In my case I think I would like to at least be aware of all crashes. Maybe an option would be to send emails once a day, but include all the crashes in that e-mail since the last e-mail
Some other feedback:
“Your “foobar” application hosted on Fly.io crashed because it ran out of memory."
It’s a bit unclear actually what this means. The first thing I’m thinking when I read this: Our main Docker process (or whatever that’s called) crashed and now our VM restarted. This is not the case though. We run supervisord
to manage a bunch of workers and even if one of the workers crashes, we get this message. I like that we are notified, but I think some more information would be welcome.
Also, the suggested commands for scaling (e.g. fly scale vm shared-cpu-1x --memory 1024 -a foo-bar
) don’t take into account your app having multiple processes.
Also, I would love some more info that we can use to debug. Which VM ID? Which process group? Perhaps even show us the command that we can use to log in to that particular VM
We’re now tracking this internally. Sounds like we should make a change here.
I can’t give you an ETA on that. Maybe the easiest fix right now is to set it to send an email hourly.