JVM Application OOMs on Fly but not in similar conditions on local machine

Hello!

I have a JVM (clojure) application deployed on fly that regularly OOMs. It’s running on a shared-cpu-4x:2048MB in mia.

I built the project locally using the same Dockerfile I deploy with:

sudo docker build . 
sudo docker run --memory=2048M -p 9010:9010 -p 3000:3000 ${imageID}

and then connected to it with VisualVM to monitor it.

On my local machine, the app’s memory usage grows but fluctuates as the garbage collector cleans things up. On the deployed app, it appears that garbage collection never happens: memory goes up, never fluctuating, until OOM restarts the app.

I’m using the eclipse-temurin:17-alpine container to build and run the application in the Dockerfile.

Am I doing something wrong to cause this behavior?
Given these facts:

  • doesn’t OOM on local machine (running on docker)
  • OOMs on fly (firecracker)
  • same memory available to both processes
  • JVM restricted to 85% of max memory (with -XX:MaxRAMPercentage=85)

and that I’m more or less at the limits of my container knowledge, I’m starting to wonder if it’s something on the Fly side. To test this, I was going to get started trying to launch a firecracker VM from the Dockerfile, but that seems like a good deal of work so I figured I would stop and ask to see if there’s anything I’m missing.

Thanks for looking over this.

Maybe try specifying max memory (-XX) in absolute GB—instead of as a percentage.

In the past, people have reported nuances with how JVM-like language runtimes determine the machine’s total available memory (due to cgroups).

[.net 5/6 app killed because of memory usage; no cgroup limit being the reason?]

Thanks for the response! I tried -Xmx1740M (85% of 2048) but still seeing the same behavior unfortunately. Memory use goes up without ever going down.

Since Fly.io runs VMs, 2048MB is used by not only your application, but also Linux kernel itself. So your application may need more memory.

This part is strange. What does your application do?

Thanks for the response! Good point about the VMs; I wasn’t actually sure of my application’s memory requirements. I tested by running my application with Xmx1024M then observing with VisualVM:

What does your application do?

It receives Excel documents (~3MB) uploaded via web form and I do some processing on them before producing an output file. The spikes in the screenshot above (from ~100MB to ~500MB, then from ~500 to ~925, then from ~600 back to ~925) are when I upload an Excel doc. You can see that garbage collection (or something else that drops used heap; I presume that’s GC) happens immediately after the second document is uploaded. I don’t see that drop when monitoring the deployed application:

It increases until it OOMs when I try to upload a spreadsheet again.

I repeated the above test on my local machine with 512MB and 768MB limits and wasn’t able to upload the Excel document more than once, so I think the lower bound is comfortably 1024, or half of what my deployed app has allotted. Am I remembering correctly that the VM is ~300 MB?

That’s a generous allowance for the Linux kernel + auxiliaries (like hallpass), but I would still try his suggestion…

How about sizing your Fly Machine at 4G (temporarily, just for testing) and then using -Xmx1024M as the JVM flag?

Personally, I would also throw in -XX:+PrintGC, -XX:+PrintGCDetails, and -XX:+PrintFlagsFinal. The output of that second one in particular is voluminous and hard to get accustomed to, but it really would eliminate the ambiguity about whether a GC actually happened, etc.

Tried that out, much improved:

I tried the flags you suggested but deploy failed with this in the logs:

2023-10-05 09:53:07.964 [my_app_str] [INFO] mia b984 32874973f30378 my-app-name NOTE: Picked up JDK_JAVA_OPTIONS: -XshowSettings:system -Xmx1024M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data/dumps -XX:+PrintGC -XX:+PrintGCDetails XX:+PrintFlagsFinal
2023-10-05 09:53:07.964 [my_app_str] [INFO] mia b984 32874973f30378 my-app-name Error: Cannot specify main class in environment variable JDK_JAVA_OPTIONS

From the graph above, I gather I was setting too high an amount of memory for the heap and the rest of the JVM wasn’t able to cope with that. Is that along the right lines?

XX:+PrintFlagsFinal needs - as like other -XX flags :slight_smile:

For troubleshooting JVM’s memory issues, I do recommend using JVM-specific tools first. There are young objects, old objects and special-purpose space(s?) such as Metaspace. Our “Memory Utilization” graph can’t capture these language-specific details.

1 Like

:man_facepalming: good eye!

Thank you for all the help and explanation, I appreciate it!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.