Hello everyone,
I’m encountering frequent Out-of-Memory (OOM) issues with my Node.js application, specifically a Remix application with an Express server. Despite various attempts to diagnose and resolve the problem, I’m still unclear about its root cause, and I would greatly appreciate your insights and suggestions.
Background:
Recently, I began experiencing erratic errors in my app—issues like hanging or crashing in unexpected places that aren’t typically performance-intensive. Subsequently, I received notifications from fly.io indicating that the app had encountered OOM conditions. Upon further investigation, I observed that the OOM occurs consistently on a specific page the first time after the machine starts, but not thereafter following a restart.
Details and Approaches Tried:
OOM Kill Logs:
[ 16.369753] Out of memory: Killed process 342 (node) total-vm:21741816kB, anon-rss:80896kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:2316kB oom_score_adj:0
[ 15.666088] Out of memory: Killed process 342 (node) total-vm:21746452kB, anon-rss:83100kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:1948kB oom_score_adj:0
[ 18.665771] Out of memory: Killed process 342 (node) total-vm:21746900kB, anon-rss:83100kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:1976kB oom_score_adj:0
These didn’t happened in sequence, I just put them together for better readability.
Grafana Metrics:
Local Checking Results:
I did some checks locally, oom never happened with the same interactions.
-
Local checks with
node --inspect
andchrome://inspect
. Snapshot 1 is after the app started and before any connection, then I recorded allocation timeline and went through the interactions, afterward I took a few other snapshots, Snapshot 6 is taken about an hour later.
-
with command
top
, theMEM
is usually around 100M, at most around 160M. -
with
/usr/bin/time -l
35.76 real 2.08 user 0.28 sys
170246144 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
14004 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
497 messages sent
258 messages received
1 signals received
15 voluntary context switches
16769 involuntary context switches
11397507944 instructions retired
5356333425 cycles elapsed
126870592 peak memory footprint
Seeking Advice:
I’m relatively new to this field and hope the above details provide sufficient insights into my situation.
- Whether you suspect a memory leak in my application or if it simply requires more memory?
- What steps you would recommend for further diagnosing and pinpointing the root cause of these intermittent OOM errors?
Thank you in advance for your help and expertise!