Fly Proxy Error after OOM Server Reboot and No App Activity

Hello! I have a fly-app battlesnake-rs. Its a Rust app running Rocket for context, though I don’t believe that’s super relevant

I see in the logs that show at 2022-02-16 04:56:15 my app got killed by an OOM error. Oops, might need a bigger box!

But after this I see a new instance boot up as expected! It seems like it boots in the same region, on the same instance e9cee09d

I see the server boot logs, but after that it never seems to receive traffic. The metrics in Fly also show basically no activity until more recently 2022-02-17 08:56:59

I DO see a few logs from the proxy containing Error 2: Internal problem. However I don’t see as many of those as I would expect to see requests about that time. I see THREE such logs. One about an hour after server boot at 2022-02-16 05:48:09 and then the other two are at 2022-02-17 06:57:27 and 2022-02-17 07:23:10

This app is a Snake at play.battlesnake.com, which is only relevant because it receives relatively heavy traffic from about 00:00 - 08:00 Eastern US Time and then is relatively ideal throughout the day. So I would have expected quite a bit more than 3 requests on the morning of 2022-02-17.

However the only logs, from the proxy or my app, during this period as the proxy errors from above.

The next log I see is at 2022-02-17 08:56:59 and its from my apps 404 catcher. It gets other random (spammy) 404s through the rest of the day, and seems to be working as expected now!

I’m wondering what might have happened to cause the app to not recieve traffic for ~a day, and then recover? Was this related to the OOM error?

I can provide more details if they would be helpful!

Thanks!

We see see OOMs in our app running on 256Ms frequently during deploys. Do OOMs co-relate with deploys for you, too?

I don’t recommend it, but you can enable swap (with a lower swapiness) to temporairly mitigate this problem, or scale up to 512M and beyond:

# 2Gs
fly scale vm shared-cpu-1x --memory 2048

I don’t think so, the deploys are pretty light on the server. Most of the ‘work’ is done on the builder when building the Dockerfile

This OOM that seemed to cause my issue was from ‘normal’ usage

I have bumped up this app to 1GB of RAM and it ran overnight fine last night!

1 Like