and then connected to it with VisualVM to monitor it.
On my local machine, the app’s memory usage grows but fluctuates as the garbage collector cleans things up. On the deployed app, it appears that garbage collection never happens: memory goes up, never fluctuating, until OOM restarts the app.
I’m using the eclipse-temurin:17-alpine container to build and run the application in the Dockerfile.
Am I doing something wrong to cause this behavior?
Given these facts:
doesn’t OOM on local machine (running on docker)
OOMs on fly (firecracker)
same memory available to both processes
JVM restricted to 85% of max memory (with -XX:MaxRAMPercentage=85)
and that I’m more or less at the limits of my container knowledge, I’m starting to wonder if it’s something on the Fly side. To test this, I was going to get started trying to launch a firecracker VM from the Dockerfile, but that seems like a good deal of work so I figured I would stop and ask to see if there’s anything I’m missing.
Thanks for the response! Good point about the VMs; I wasn’t actually sure of my application’s memory requirements. I tested by running my application with Xmx1024M then observing with VisualVM:
It receives Excel documents (~3MB) uploaded via web form and I do some processing on them before producing an output file. The spikes in the screenshot above (from ~100MB to ~500MB, then from ~500 to ~925, then from ~600 back to ~925) are when I upload an Excel doc. You can see that garbage collection (or something else that drops used heap; I presume that’s GC) happens immediately after the second document is uploaded. I don’t see that drop when monitoring the deployed application:
It increases until it OOMs when I try to upload a spreadsheet again.
I repeated the above test on my local machine with 512MB and 768MB limits and wasn’t able to upload the Excel document more than once, so I think the lower bound is comfortably 1024, or half of what my deployed app has allotted. Am I remembering correctly that the VM is ~300 MB?
That’s a generous allowance for the Linux kernel + auxiliaries (like hallpass), but I would still try his suggestion…
How about sizing your Fly Machine at 4G (temporarily, just for testing) and then using -Xmx1024M as the JVM flag?
Personally, I would also throw in -XX:+PrintGC, -XX:+PrintGCDetails, and -XX:+PrintFlagsFinal. The output of that second one in particular is voluminous and hard to get accustomed to, but it really would eliminate the ambiguity about whether a GC actually happened, etc.
I tried the flags you suggested but deploy failed with this in the logs:
2023-10-05 09:53:07.964 [my_app_str] [INFO] mia b984 32874973f30378 my-app-name NOTE: Picked up JDK_JAVA_OPTIONS: -XshowSettings:system -Xmx1024M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data/dumps -XX:+PrintGC -XX:+PrintGCDetails XX:+PrintFlagsFinal
2023-10-05 09:53:07.964 [my_app_str] [INFO] mia b984 32874973f30378 my-app-name Error: Cannot specify main class in environment variable JDK_JAVA_OPTIONS
From the graph above, I gather I was setting too high an amount of memory for the heap and the rest of the JVM wasn’t able to cope with that. Is that along the right lines?
XX:+PrintFlagsFinal needs - as like other -XX flags
For troubleshooting JVM’s memory issues, I do recommend using JVM-specific tools first. There are young objects, old objects and special-purpose space(s?) such as Metaspace. Our “Memory Utilization” graph can’t capture these language-specific details.