Node.js App Out-of-Memory

Hello everyone,

I’m encountering frequent Out-of-Memory (OOM) issues with my Node.js application, specifically a Remix application with an Express server. Despite various attempts to diagnose and resolve the problem, I’m still unclear about its root cause, and I would greatly appreciate your insights and suggestions.

Background:

Recently, I began experiencing erratic errors in my app—issues like hanging or crashing in unexpected places that aren’t typically performance-intensive. Subsequently, I received notifications from fly.io indicating that the app had encountered OOM conditions. Upon further investigation, I observed that the OOM occurs consistently on a specific page the first time after the machine starts, but not thereafter following a restart.

Details and Approaches Tried:

OOM Kill Logs:

[   16.369753] Out of memory: Killed process 342 (node) total-vm:21741816kB, anon-rss:80896kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:2316kB oom_score_adj:0
[   15.666088] Out of memory: Killed process 342 (node) total-vm:21746452kB, anon-rss:83100kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:1948kB oom_score_adj:0
[   18.665771] Out of memory: Killed process 342 (node) total-vm:21746900kB, anon-rss:83100kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:1976kB oom_score_adj:0

These didn’t happened in sequence, I just put them together for better readability.

Grafana Metrics:

Local Checking Results:
I did some checks locally, oom never happened with the same interactions.

  • Local checks with node --inspect and chrome://inspect. Snapshot 1 is after the app started and before any connection, then I recorded allocation timeline and went through the interactions, afterward I took a few other snapshots, Snapshot 6 is taken about an hour later.

  • with command top, the MEM is usually around 100M, at most around 160M.

  • with /usr/bin/time -l

       35.76 real         2.08 user         0.28 sys
           170246144  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
               14004  page reclaims
                   0  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                 497  messages sent
                 258  messages received
                   1  signals received
                  15  voluntary context switches
               16769  involuntary context switches
         11397507944  instructions retired
          5356333425  cycles elapsed
           126870592  peak memory footprint

Seeking Advice:

I’m relatively new to this field and hope the above details provide sufficient insights into my situation.

  • Whether you suspect a memory leak in my application or if it simply requires more memory?
  • What steps you would recommend for further diagnosing and pinpointing the root cause of these intermittent OOM errors?

Thank you in advance for your help and expertise!

If it is a specific page after the machine restarts, I would not suspect a memory leak is the cause. Either you need to experiment a bit to get Node’s garbage collection to kick in before you crash, or you need more memory.

You can find some suggestions here: Scaling · Fly Docs

1 Like

Sounds like there’s a memory leak either in your app or a 3rd party library you’re using. Have you tried running your app locally in production mode (the same thing in your deployed app).

I’ve had this happened to me a few months ago w/ the turso db client. It dev mode, there were no memory leaks because the system was constantly cleaning up memory when reloading/refreshing pages. Once I run it as a production build, I could see the leak locally. Once the turso team was notified, they fixed it pretty quick.

I was using production mode throughout my checks.
I’m also using turso db client, sometimes it errs, but I guess with memory insufficiency it could d be expected?

Hey thanks!!
I would look into scaling.

I have a follow-up question,
How could one tell if the memory isn’t enough?

The thing is that I don’t see the hint that’s suggesting I have insufficient memory, because with the metrics it seems to me that the app isn’t using all the memory and there are still some memory to use.

If there’s a leak, it won’t matter how much memory you give it, it’ll eventually run out and crash. What’s turso db and libsql version? The memory leak was fixed some time in Nov/Dec 2023

"@libsql/client": "0.4.3"

I’m asking because I want to know how to read and diagnose the metrics correctly.
the OOM didn’t happen after a while after the machine started, so it don’t feel like memory leak.
Another thing is that I’m unclear of why I’m having memory issues, because in my eyes the metrics shows that I have enough memory.

Oh yea… that’s really old and has the memory leak iirc. Also update your turso db

It’s a version around this year January, I think the memory leak you mentioned is fixed in this version? and I check the db it seems to have automatic updating.

Anyways, I will update my db, thanks for suggestion!

Yea you’re right, leak was fix in rc 0.4.0, so it should be fixed in 0.4.3
Something else is the culprit.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.