I have been working on a small project to reproduce an issue related to memory usage in Next.js. The project is built using the Next.js canary version 13.4.3-canary.1. It utilizes Next.js with App Router and Server Actions and does not use a database.
The problem arises when deploying the project on different platforms and observing the memory usage behavior. I have deployed the project on multiple platforms for testing purposes, including Vercel and Fly.io.
On Vercel: https://next-js-ram-example.vercel.app/
When interacting with the deployed version on Vercel, the project responds as expected. The memory usage remains stable and does not show any significant increase or latency
On Fly.io: https://memory-test.fly.dev/
However, when deploying the project on Fly.io, I noticed that the memory usage constantly remains around 220 MB, even during normal usage scenarios. Furthermore, when interacting with the project by clicking on the navigation bar to switch between different pages, it appears unresponsive and fails to provide the expected response or navigate smoothly.
I expect the small project to run smoothly without encountering any memory-related issues when deployed on Fly.io. Considering the previous successful deployment on Fly.io, which involved additional resource usage and utilized Next.js 13 with App Router and Server Actions, my anticipation is that the memory usage will remain stable and within acceptable limits.
Thank you for your response and the information provided. Unfortunately, I’m still experiencing an issue with memory usage when deploying the project on Fly.io. Despite my previous successful deployment on Fly.io using additional resources and an earlier version of Next.js, the project continues to encounter high memory usage and eventually gets killed due to running out of memory. This issue is preventing the project from running smoothly on the platform.
Here’s an excerpt from the recent logs:
2023-05-24T11:57:16.113 app[6e82dd75c45d87] arn [info] [ 354.343753] Out of memory: Killed process 513 (node) total-vm:803956kB, anon-rss:54272kB, file-rss:4kB, shmem-rss:0kB, UID:0 pgtables:1140kB oom_score_adj:0
2023-05-24T11:57:16.313 app[6e82dd75c45d87] arn [info] Starting clean up.
2023-05-24T11:57:16.313 app[6e82dd75c45d87] arn [info] Process appears to have been OOM killed!
2023-05-24T11:57:17.313 app[6e82dd75c45d87] arn [info] [ 355.545338] reboot: Restarting system
2023-05-24T11:57:17.410 app[6e82dd75c45d87] arn [info] Out of memory: Killed process
2023-05-24T11:57:17.458 runner[6e82dd75c45d87] arn [info] machine did not have a restart policy, defaulting to restart
In the issue I created on the Next.js GitHub repository (#49929), I received a response suggesting that this behavior might be related to a memory leak. Here is the comment in question.
I would like to add here that the amount of RAM coming from fly machine status id-xxx-xxx-xx is not the same you would have available in the system. If you inspect the operative ram runing grep MemTotal /proc/meminfo being connected via flyctl ssh console:
So if your machine has 256MB from the free allowance your app should run under 225MB. I am not sure If this behaviour is new. but recently all my Next.js apps in free allowances are way slower than usual.
This can be of course a Next.js leak but also maybe the amount of operative RAM being less than the advertised.
@jerome@shortdiv would it be possible to get an official Fly.io team response to these memory problems? Deploying an app with the current Next.js version to a Hobby plan machine now currently leads it continuously crashing.
Also - has the default amount of memory changed from 256MB to 225MB like @josehower mentioned above?
According to the issue on the next.js repo, there’s a memory leak from the event emitter. I bet most people run their next.js apps in serverless environment where they’re short-lived and recycled before they hit these issues? I’m not certain.
Adding swap won’t exactly help forever if there is a memory leak, it’ll just take longer before it runs out of memory.
As mentioned before, since Vercel runs on AWS Lambda, there are many differences, including the fact that the virtual machines are constantly being stopped and started, preventing the accumulation of leaked memory.
I looked at apps v1 platform VMs and they seem to have the same discrepancy. I’m looking around a bit, but I’m not sure where it comes from. We’re passing in 256MiB to firecracker, I am sure of that. So the discrepancy is created somewhere else, either from firecracker or the kernel? Not sure.
Looks like the total memory is correct here: 261752K / 1024 = ~256MiB
The “missing” memory is reserved by various kernel things (see the kernel code, rwdata, etc. in that log line). We are running full virtual machines after all, there is some overhead involved within a machine.