simple phoenix app with shared-cpu-1x is failing with OOM

vrnithinkumar · August 6, 2024, 9:17pm

I was making a simple HTML change to my Phoenix (1.7.11) app and it is now failing due to OOM. I tried reverting my changes and it is still happening the same. My Phoenix app is very simple and it has 0 users. I am in the hobby plan and so using shared-cpu-1x

Below is the error and even the total VM memory used 1789260kB is below the 256mb limit.
So basically I am clueless

ams [info] [ 4.536562] Out of memory: Killed process 318 (beam.smp) total-vm:1789260kB, anon-rss:83092kB, file-rss:0kB, shmem-rss:71476kB, UID:65534 pgtables:472kB oom_score_adj:0
ams [info] INFO Main child exited with signal (with signal 'SIGKILL', core dumped? false)
ams [info] INFO Process appears to have been OOM killed!

vrnithinkumar · August 6, 2024, 9:19pm

From General to Phoenix

vrnithinkumar · August 6, 2024, 9:19pm

Added elixir, troubleshooting

roadmr · August 6, 2024, 11:11pm

It is not.

1789260kiB = 1747.32MiB > 256 MiB (these are kibibytes/mebibytes, not kilobytes/megabytes).

You definitely need to add more memory - OOMs don’t lie

Daniel

ikhan · August 7, 2024, 2:11am

My Phoenix app had started going into OOM loop at deployment just now too. Tbf, mine veers dangerously close to 256mb limit anyway (but has been deploying fine) so any new code added was my first suspect, specially as it still works fine under 512mb.

However I rolled back release to last known healthy image that was working fine earlier (deployed 2.5 days ago), but even that one is getting OOM killed now.

khuezy · August 7, 2024, 2:53am

Speaking off, I noticed my nextjs app suddenly has about +60MB of usage in the last couple days, nothing changed that I can think of. Did something on Fly’s end cause this upshoot in memory usage or the 3 of us just coincidence?

kentare · August 7, 2024, 6:40am

I have the exact problem with a simple phoenix app. Have been running fine for a year now, but started getting OOM when restarting today.

arn [info] [ 4.898602] Out of memory: Killed process 318 (beam.smp) total-vm:1767996kB, anon-rss:79500kB, file-rss:0kB, shmem-rss:77600kB, UID:65534 pgtables:468kB oom_score_adj:0
arn [info] INFO Main child exited with signal (with signal 'SIGKILL', core dumped? false)
arn [info] INFO Process appears to have been OOM killed!

The total VM memory used is similar to OP. Works fine when scaling to 512mb, but when I am scaling down to 256mb again I get the same problem.

vrnithinkumar · August 7, 2024, 8:45am

@roadmr Thank you for the reply.
I misinterpreted it since my usual memory usage was around the range of 180MiB.
I am still struggling to figure out what changed from yesterday. I tried to revert to the old version and had no luck. And the traffic has not changed at all it is always very low almost 0.
Is 512 MiB memory required to run a simple basic Phoenix app now?

kurt · August 7, 2024, 12:25pm

We have started suggesting a minimum of 1gb RAM for full stack apps. It’s actually kinda hard to run stuff in under 256mb.

For side projects it’s far better to let machines auto stop / scale to zero for money saving purposes.

We’re seeing normal numbers of OOMs across all customers.

If you’re curious, February 28th of this year was our busiest OOM day ever.

khuezy · August 7, 2024, 1:03pm

Is this across ALL memory permutations, or just the 256MB configuration? Did you notice any difference in 256MB => 512MB changes in the last few days? I’ve bumped my app 1 tier up to avoid the weird memory crawl, I wonder if that has any affect on your observation of normal OOM errors…

khuezy · August 7, 2024, 1:10pm

@kurt I believe there is something funky going on on Fly’s infra

I have a typescript temporal worker running on 256MB, it flatlines at about 174MB for the last while. Then literally a few minutes ago, I scale the worker to 0 and rescaled it back to 1 (no code or config changes) and the baseline memory shot up to about 200MB.

I’m a little confused.

What’s also odd is the previous stats shows 217MB of total, now the total is 213MB, something on Fly is consuming 4 more MB.

vrnithinkumar · August 7, 2024, 3:09pm

For me adding the swap_size_mb at fly.toml fixed the OOM issue.
But I can see a jump in memory usage from 180 MB to 200 MB without any code or traffic change.
Also, I am seeing the OOM with my Postgres now.

swap_size_mb = 512

winston-ce · August 7, 2024, 3:54pm

With no change in application code, we’re also seeing new deploys fail. I’ve tested locally, constraining memory to 128MB for the app process (node) and separately constraining via docker. I have to limit memory to 64MB in order to trigger an OOM. There definitely appears to be something the matter with the 256MB VMs, our first deploy failed yesterday. I’ve also destroyed an existing machine and deployed to a new with the same results. According to Fly’s Grafana, a fly instance (machine/vm) for this app (when running) never uses more than 145MB.

Updated data point: to be clear, you can replicate a deploy exactly (by checking out and deployed a prior release that deployed stably) and see that the machine no longer boots.

winston-ce · August 7, 2024, 4:15pm

For now, the app is deployed and running on 256MB, but I had to add a tiny swap partition. That’s swap_size_mb = 128 as a top level option in fly.toml

khuezy · August 7, 2024, 4:20pm

I wouldn’t recommend using swap_size_mb as a workaround, since suspend doesn’t support it. We have to wait to see if this change was intentional or not. If it is, I guess they are kicking the 256MB bums (like myself) off the bus bench.

winston-ce · August 8, 2024, 1:43pm

Where is the incompatibility you mention documented? I couldn’t find anything that mentions it.

khuezy · August 8, 2024, 2:39pm

It’s not documented in the official docs, but it’s mentioned in a post: New feature in preview: suspend/resume for Machines

Any updates from Fly on this?

system · August 15, 2024, 2:40pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Memory sizing is reset to 256MB on every deploy Questions / Help elixir , machines	3	39	October 18, 2024
Phoenix little app with NO DB crashing - OOM Phoenix elixir	1	250	July 5, 2023
my_app ran out of memory and crashed Questions / Help elixir	6	669	May 10, 2023
OOM errors despite having raised memory allocation	5	336	April 23, 2021
"app oom'd" emails - did app really OOM though?	5	422	August 31, 2023

simple phoenix app with shared-cpu-1x is failing with OOM

Related topics