High Firecracker Load Average and unresponsive application process

We have an Rails app with 2 processes: web and worker (sidekiq). The app is concordia-production-web. Sometimes the worker is stopped. When we discover this, enough time has passed that the logs of this process are lost behind the logs of the web process in history and we can’t figure out what’s going on. When I restart the worker process, it starts again and it works. I noticed an interesting metric:

Maybe you know what this could mean?

P.S. This happened twice. Now I realized that I could try to connect via ssh and see what happened to this app, but this app (4636865b) is no longer running. It seems to me that the first time this happened, I couldn’t connect and got an error.

I took a look at our logs for the 4636865b instance and it seems like it just stopped logging completely for 25 hours.

Here they are in reverse chronological order (top is most recent):

hm… it’s interesting… no errors… Could there be some kind of memory leak? or any other internal problems with our app? I’m trying to understand why metric Firecracker Load Average was at the maximum … obviously these are related things

Hi @jerome

Could you please help me with a solution? it’s important because this is production

Can you try to fly ssh console into your instance as this is happening?

You can then run top and various tools to find out what’s using so much resources.