I was making a simple HTML change to my Phoenix (1.7.11) app and it is now failing due to OOM. I tried reverting my changes and it is still happening the same. My Phoenix app is very simple and it has 0 users. I am in the hobby plan and so using shared-cpu-1x
Below is the error and even the total VM memory used 1789260kB is below the 256mb limit.
So basically I am clueless
ams [info] [ 4.536562] Out of memory: Killed process 318 (beam.smp) total-vm:1789260kB, anon-rss:83092kB, file-rss:0kB, shmem-rss:71476kB, UID:65534 pgtables:472kB oom_score_adj:0
ams [info] INFO Main child exited with signal (with signal 'SIGKILL', core dumped? false)
ams [info] INFO Process appears to have been OOM killed!
My Phoenix app had started going into OOM loop at deployment just now too. Tbf, mine veers dangerously close to 256mb limit anyway (but has been deploying fine) so any new code added was my first suspect, specially as it still works fine under 512mb.
However I rolled back release to last known healthy image that was working fine earlier (deployed 2.5 days ago), but even that one is getting OOM killed now.
Speaking off, I noticed my nextjs app suddenly has about +60MB of usage in the last couple days, nothing changed that I can think of. Did something on Fly’s end cause this upshoot in memory usage or the 3 of us just coincidence?
I have the exact problem with a simple phoenix app. Have been running fine for a year now, but started getting OOM when restarting today.
arn [info] [ 4.898602] Out of memory: Killed process 318 (beam.smp) total-vm:1767996kB, anon-rss:79500kB, file-rss:0kB, shmem-rss:77600kB, UID:65534 pgtables:468kB oom_score_adj:0
arn [info] INFO Main child exited with signal (with signal 'SIGKILL', core dumped? false)
arn [info] INFO Process appears to have been OOM killed!
The total VM memory used is similar to OP. Works fine when scaling to 512mb, but when I am scaling down to 256mb again I get the same problem.
@roadmr Thank you for the reply.
I misinterpreted it since my usual memory usage was around the range of 180MiB.
I am still struggling to figure out what changed from yesterday. I tried to revert to the old version and had no luck. And the traffic has not changed at all it is always very low almost 0.
Is 512 MiB memory required to run a simple basic Phoenix app now?
Is this across ALL memory permutations, or just the 256MB configuration? Did you notice any difference in 256MB => 512MB changes in the last few days? I’ve bumped my app 1 tier up to avoid the weird memory crawl, I wonder if that has any affect on your observation of normal OOM errors…
I have a typescript temporal worker running on 256MB, it flatlines at about 174MB for the last while. Then literally a few minutes ago, I scale the worker to 0 and rescaled it back to 1 (no code or config changes) and the baseline memory shot up to about 200MB.
I’m a little confused.
What’s also odd is the previous stats shows 217MB of total, now the total is 213MB, something on Fly is consuming 4 more MB.
For me adding the swap_size_mb at fly.toml fixed the OOM issue.
But I can see a jump in memory usage from 180 MB to 200 MB without any code or traffic change.
Also, I am seeing the OOM with my Postgres now.
With no change in application code, we’re also seeing new deploys fail. I’ve tested locally, constraining memory to 128MB for the app process (node) and separately constraining via docker. I have to limit memory to 64MB in order to trigger an OOM. There definitely appears to be something the matter with the 256MB VMs, our first deploy failed yesterday. I’ve also destroyed an existing machine and deployed to a new with the same results. According to Fly’s Grafana, a fly instance (machine/vm) for this app (when running) never uses more than 145MB.
Updated data point: to be clear, you can replicate a deploy exactly (by checking out and deployed a prior release that deployed stably) and see that the machine no longer boots.
I wouldn’t recommend using swap_size_mb as a workaround, since suspend doesn’t support it. We have to wait to see if this change was intentional or not. If it is, I guess they are kicking the 256MB bums (like myself) off the bus bench.