With recent 256MB Machine Out Of Memory errors, e.g. @PietroPan’s here & @vrnithinkumar’s here, I have been wondering if the one item in a Fly VM that isn’t under an Organization’s control - Fly’s hallpass daemon (inserted at runtime?) - may have recently been updated?
If this is the case(?), is it possible that Fly’s hallpass is now pushing some 256MB Machines - those that have previously been operating close to but not OOM - into OOM errors?
fly ssh console -C "sh -c 'date;uptime;ps -aux | grep VSZ | grep -v grep;ps -aux | grep hallpass | grep -v grep;md5sum /.fly/hallpass;ls -l /.fly/hallpass;export | grep FLY_VM_MEMORY_MB'" --app insert_app_name_here
Example output from an app’s VM running for 58 days:
Mon Aug 19 10:46:34 UTC 2024
10:46:34 up 58 days, 1:42, 0 users, load average: 0.00, 0.00, 0.00
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 514 0.0 2.1 713372 4956 ? Sl Jun22 43:06 /.fly/hallpass
a6cc0fe88825332c6483d91cac6a3d6e /.fly/hallpass
-rwxr-xr-x 1 root root 5864579 Jun 22 09:03 /.fly/hallpass
export FLY_VM_MEMORY_MB='256'
Example output from another app’s VM, running for 2 days:
Mon Aug 19 10:49:35 UTC 2024
10:49:35 up 2 days, 9:43, 0 user, load average: 0.00, 0.00, 0.00
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 318 0.0 1.2 1228836 2768 ? Sl Aug17 1:53 /.fly/hallpass
e6f93fc4ce80042063929614e1bd380b /.fly/hallpass
-rwxr-xr-x 1 root root 6333298 Aug 17 01:06 /.fly/hallpass
export FLY_VM_MEMORY_MB='256'
VSZ on a previous hallpass is 713372, with the newer version 1228836. What or why, if at all, this may make a difference to OOMs on Fly I do not know - although it is something that appears to have changed (+~60%) with a more recent version of hallpass (MD5 hash of e6f93fc4ce80042063929614e1bd380b).