High CPU usage all of a sudden

Evently · January 21, 2024, 4:23pm

Hello, we are experiencing something weird with our fly deployment. All of a sudden the CPU Utiization spikes and then never goes down. There is no apparant use on the server that should cause it. This is how it looks in grafana. I’m not sure what “steal” is but that seems to be the culprit. Is there anything we can do about it?

Worth mentioning is that its the same problem on both our postgres machine and the app (the remix server)

ignoramous · January 21, 2024, 4:51pm

Are you using swap? We see such spikes when kswapd frantically tries to keep the processes running despite overwhelming memory pressure (probably caused by a memory leak we haven’t yet tracked down); a tonne of madvise syscalls and what not push up CPU utilization (try using strace and friends the next time you catch it happening).

Evently · January 21, 2024, 5:10pm

On our remix app we use swap, so i will look into it, the memory utilization is pretty low tho. And on our postgres machine we dont use swap so for that server i dont think that is the problem. The weird thing is that it happend at the exact same time for both machines. But we didnt see a spike in users or anything.

AsymetricalData · January 21, 2024, 7:14pm

I got the same issue yesterday (Sat. 20 January) from 14;00 to 23:00 approx.

I still don’t know why but it stopped.

Evently · January 21, 2024, 8:21pm

Did you do anything to fix it? It still a problem for us.

AsymetricalData · January 21, 2024, 9:24pm

Nothing at all. I restarted the VM multiple times but it didn’t worked at all.
And suddenly at 23:00, it stopped.

Still no clue about it.

Evently · January 21, 2024, 9:30pm

I see, i have also tried restarting them without any change. Very annoying

Evently · January 22, 2024, 8:14am

I can confirm that our problem solved itself around 23:00-23:20. I have a feeling that it is a problem with the infrastructure and not something that we can do about it. Would like to talk to fly about it but we are only on the hobby plan for now. But if the problem keeps coming back we have to do something.

Evently · January 22, 2024, 12:46pm

Today the CPU usage went up again about an hour ago. Will have to keep looking into it.

zifnab · January 22, 2024, 6:19pm

“steal” can generally be ignored here. we allocate partial cores for cpus, you should be able to burst up to the full cpu core if it’s not in use. if it is in use, your machine’s kernel reports that time as “steal”. it doesn’t mean your application is using more CPU - it just means the idle time for your CPU was used elsewhere.

generally it should be near 0 (we run out of allocatable memory on a host long before we run out of allocatable cpu). If enough machines burst at the same time on the same host, you’ll see steal reported.

Evently · January 22, 2024, 7:58pm

I see, but what i don’t understand is when we get these spikes in the last 2 weeks or something. The only indication that something is happening is that our website is many times slower in loading than 90% of the time, and it is not due to our own traffic to the page. And the only thing we can see (on two different machines, our postgres and our remix app, at the exact same time) and the only thing that is changing is our CPU utilization. This image is from today:

We did not have a spike in visitors, our memory and network didnt change at all. Only CPU, and this is if we inspect it more:

I’m not an expert in these kinds of stuff so i’m just trying to figure out what is happening but it seems to me that it is something else with the server than our remix app / postgres database since we didn’t see an increase in users at all.

Do you have a tip for us to try to figure out what is going on? Can it be someone else on the shared server doing something heavy?

Edit: The “spikes” around 18:15 is our increase in users and when we were testing stuff so that is as it should be, but as you can see around 12 where it goes up WAY more than we can make it do ourselves even if we try.

system · January 29, 2024, 7:59pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Investigating CPU spike Questions / Help metrics , troubleshooting , machines	3	54	April 30, 2025
Postgres CPU Spikes Questions / Help postgres	0	382	February 10, 2023
Cpu usage went up for no reason after problem with deploy Questions / Help	1	249	January 22, 2024
High CPU steal machines	1	142	September 2, 2024
High steal cpu usage 2 Questions / Help postgres , machines	3	126	August 14, 2024

High CPU usage all of a sudden

Related topics