JVM Application OOMs on Fly but not in similar conditions on local machine

lucianolaratelli · October 4, 2023, 6:39pm

Hello!

I have a JVM (clojure) application deployed on fly that regularly OOMs. It’s running on a shared-cpu-4x:2048MB in mia.

I built the project locally using the same Dockerfile I deploy with:

sudo docker build . 
sudo docker run --memory=2048M -p 9010:9010 -p 3000:3000 ${imageID}

and then connected to it with VisualVM to monitor it.

On my local machine, the app’s memory usage grows but fluctuates as the garbage collector cleans things up. On the deployed app, it appears that garbage collection never happens: memory goes up, never fluctuating, until OOM restarts the app.

I’m using the eclipse-temurin:17-alpine container to build and run the application in the Dockerfile.

Am I doing something wrong to cause this behavior?
Given these facts:

doesn’t OOM on local machine (running on docker)
OOMs on fly (firecracker)
same memory available to both processes
JVM restricted to 85% of max memory (with -XX:MaxRAMPercentage=85)

and that I’m more or less at the limits of my container knowledge, I’m starting to wonder if it’s something on the Fly side. To test this, I was going to get started trying to launch a firecracker VM from the Dockerfile, but that seems like a good deal of work so I figured I would stop and ask to see if there’s anything I’m missing.

Thanks for looking over this.

mayailurus · October 4, 2023, 7:02pm

Maybe try specifying max memory (-XX) in absolute GB—instead of as a percentage.

In the past, people have reported nuances with how JVM-like language runtimes determine the machine’s total available memory (due to cgroups).

[.net 5/6 app killed because of memory usage; no cgroup limit being the reason?]

lucianolaratelli · October 4, 2023, 7:15pm

Thanks for the response! I tried -Xmx1740M (85% of 2048) but still seeing the same behavior unfortunately. Memory use goes up without ever going down.

kaz · October 4, 2023, 9:51pm

Since Fly.io runs VMs, 2048MB is used by not only your application, but also Linux kernel itself. So your application may need more memory.

This part is strange. What does your application do?

lucianolaratelli · October 5, 2023, 1:18am

Thanks for the response! Good point about the VMs; I wasn’t actually sure of my application’s memory requirements. I tested by running my application with Xmx1024M then observing with VisualVM:

What does your application do?

It receives Excel documents (~3MB) uploaded via web form and I do some processing on them before producing an output file. The spikes in the screenshot above (from ~100MB to ~500MB, then from ~500 to ~925, then from ~600 back to ~925) are when I upload an Excel doc. You can see that garbage collection (or something else that drops used heap; I presume that’s GC) happens immediately after the second document is uploaded. I don’t see that drop when monitoring the deployed application:

It increases until it OOMs when I try to upload a spreadsheet again.

I repeated the above test on my local machine with 512MB and 768MB limits and wasn’t able to upload the Excel document more than once, so I think the lower bound is comfortably 1024, or half of what my deployed app has allotted. Am I remembering correctly that the VM is ~300 MB?

mayailurus · October 5, 2023, 2:41am

That’s a generous allowance for the Linux kernel + auxiliaries (like hallpass), but I would still try his suggestion…

How about sizing your Fly Machine at 4G (temporarily, just for testing) and then using -Xmx1024M as the JVM flag?

Personally, I would also throw in -XX:+PrintGC, -XX:+PrintGCDetails, and -XX:+PrintFlagsFinal. The output of that second one in particular is voluminous and hard to get accustomed to, but it really would eliminate the ambiguity about whether a GC actually happened, etc.

lucianolaratelli · October 5, 2023, 2:19pm

Tried that out, much improved:

I tried the flags you suggested but deploy failed with this in the logs:

2023-10-05 09:53:07.964 [my_app_str] [INFO] mia b984 32874973f30378 my-app-name NOTE: Picked up JDK_JAVA_OPTIONS: -XshowSettings:system -Xmx1024M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data/dumps -XX:+PrintGC -XX:+PrintGCDetails XX:+PrintFlagsFinal
2023-10-05 09:53:07.964 [my_app_str] [INFO] mia b984 32874973f30378 my-app-name Error: Cannot specify main class in environment variable JDK_JAVA_OPTIONS

From the graph above, I gather I was setting too high an amount of memory for the heap and the rest of the JVM wasn’t able to cope with that. Is that along the right lines?

kaz · October 5, 2023, 3:32pm

XX:+PrintFlagsFinal needs - as like other -XX flags

For troubleshooting JVM’s memory issues, I do recommend using JVM-specific tools first. There are young objects, old objects and special-purpose space(s?) such as Metaspace. Our “Memory Utilization” graph can’t capture these language-specific details.

lucianolaratelli · October 5, 2023, 3:39pm

good eye!

Thank you for all the help and explanation, I appreciate it!

system · October 12, 2023, 3:40pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
VM Killed Despite 90% Free Memory: Out-of-Memory Error Questions / Help	4	1241	February 21, 2023
OOM errors despite having raised memory allocation	5	335	April 23, 2021
RAM Limit in Docker Container Questions / Help	7	966	August 5, 2022
Deployment of Java Spring API using Dockerfile Questions / Help	8	8304	February 15, 2023
[application name] ran out of memory and crashed	3	828	August 18, 2022

JVM Application OOMs on Fly but not in similar conditions on local machine

Related topics