256MB Machine OOM & Fly's hallpass daemon

With recent 256MB Machine Out Of Memory errors, e.g. @PietroPan’s here & @vrnithinkumar’s here, I have been wondering if the one item in a Fly VM that isn’t under an Organization’s control - Fly’s hallpass daemon (inserted at runtime?) - may have recently been updated?

If this is the case(?), is it possible that Fly’s hallpass is now pushing some 256MB Machines - those that have previously been operating close to but not OOM - into OOM errors?

fly ssh console -C "sh -c 'date;uptime;ps -aux | grep VSZ | grep -v grep;ps -aux | grep hallpass | grep -v grep;md5sum /.fly/hallpass;ls -l /.fly/hallpass;export | grep FLY_VM_MEMORY_MB'" --app insert_app_name_here

Example output from an app’s VM running for 58 days:

Mon Aug 19 10:46:34 UTC 2024
 10:46:34 up 58 days,  1:42,  0 users,  load average: 0.00, 0.00, 0.00
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root       514  0.0  2.1 713372  4956 ?        Sl   Jun22  43:06 /.fly/hallpass
a6cc0fe88825332c6483d91cac6a3d6e  /.fly/hallpass
-rwxr-xr-x 1 root root 5864579 Jun 22 09:03 /.fly/hallpass
export FLY_VM_MEMORY_MB='256'

Example output from another app’s VM, running for 2 days:

Mon Aug 19 10:49:35 UTC 2024
 10:49:35 up 2 days,  9:43,  0 user,  load average: 0.00, 0.00, 0.00
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root       318  0.0  1.2 1228836 2768 ?        Sl   Aug17   1:53 /.fly/hallpass
e6f93fc4ce80042063929614e1bd380b  /.fly/hallpass
-rwxr-xr-x 1 root root 6333298 Aug 17 01:06 /.fly/hallpass
export FLY_VM_MEMORY_MB='256'

VSZ on a previous hallpass is 713372, with the newer version 1228836. What or why, if at all, this may make a difference to OOMs on Fly I do not know - although it is something that appears to have changed (+~60%) with a more recent version of hallpass (MD5 hash of e6f93fc4ce80042063929614e1bd380b).

2 Likes

We’re paying attention to this, but comparing old and new Hallpass memory utilization, I’m seeing roughly consistent (and low) resident memory usage.

We lost ~3MB of total memory on the 256MB machines; was 217MB now it’s 213MB.

I’m not sure from which stats @khuezy has taken the 217MB to 213MB drop from - Firecracker Memory Usage in Fly’s App Dashboard Metrics?

From a Fly perspective, now there is a possible number (~4MB), would you be able to check Memory Usage/etc for 256MB apps that have been restarted pre & post hallpass-update(s)?

Though there may be a range of causes for differing memory usage (e.g. the apps in question) - if there appears to a fairly consistent (~4MB) pre & post hallpass-update pattern, across a range of apps, this may point at the hallpass update as the cause (albeit other Fly things may also have changed around the same time).

I appreciate that complaints from the peanut gallery (256MB), especially over something possibly trifling as a mere 4MB of memory, may be disproportionate to the revenue of those apps. Although I think a counter argument to this, if it’s a hallpass update (or another Fly change) that’s the root cause, is that requiring a 100% uplift in memory for a problem that may not be of an Organization’s making - isn’t a small change.

I hope that the ‘solution’ isn’t to deprecate 256MB machines :cry:.

Hey man, 3MB is a lot when you’re talking about ~200MB

Hi @whistler,

Can you confirm for us the kernel version uname -a of the machine that has the larger hallpass?

To clarify, that’d be the old machine (58 days up); actual RAM usage is higher in that one (4956 RSS) compared to the new machine (2768 RSS), even if the newer binary is slightly larger and VSZ is also larger.

Thanks,

  • Daniel

From the 2 (6) days app’s machine:

e6f93fc4ce80042063929614e1bd380b /.fly/hallpass
-rwxr-xr-x 1 root root 6333298 Aug 17 01:06 /.fly/hallpass
Linux 6e82920fe046e8 5.15.98-fly #g534f603e72 SMP Fri Aug 9 18:17:05 UTC 2024 x86_64 GNU/Linux

From the 58 (62) days app’s machine:

a6cc0fe88825332c6483d91cac6a3d6e /.fly/hallpass
-rwxr-xr-x 1 root root 5864579 Jun 22 09:03 /.fly/hallpass
Linux 3d8d9349b95528 5.12.2 #1 SMP Mon Oct 3 13:48:56 UTC 2022 x86_64 GNU/Linux

Note: I’m not suggesting either of the above apps or machines have a problem - just that there may be a problem with some 256MB Machines, and trying to consider what common factor(s) may have changed recently. Hallpass was one thing I could think of (that is, if it has changed recently).

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.