I wanted to get a better understanding of when we need to scale our application and to track performance, but I’m not quite sure how to read/test. I assumed the 0 - 1 on the Y axios was 0% to 100% CPU usage, but I see spikes going over 1 to 1.5, so I’m not super sure how to interrupt a lot of the charts on the metrics tabs.
Maybe need someone from Fly to correct this, but my understanding is that number is processes relative to CPU. So the number can exceed 1, even with 1 CPU.
e.g found this random explanation of general load average on Linux which I assume uses the same idea as Firecracker would.
Load average: 1.00, 0.40, 3.35
On a single core system this would mean:
- The CPU was fully (100%) utilized on average; 1 processes was running on the CPU (1.00) over the last 1 minute.
- The CPU was idle by 60% on average; no processes were waiting for CPU time (0.40) over the last 5 minutes.
- The CPU was overloaded by 235% on average; 2.35 processes were waiting for CPU time (3.35) over the last 15 minutes.
On a dual-core system this would mean:
- The one CPU was 100% idle on average, one CPU was being used; no processes were waiting for CPU time(1.00) over the last 1 minute.
- The CPUs were idle by 160% on average; no processes were waiting for CPU time. (0.40) over the last 5 minutes.
- The CPUs were overloaded by 135% on average; 1.35 processes were waiting for CPU time. (3.35) over the last 15 minutes.
Awesome, so if I’m running a dedicated 4 core CPU, and my average load is around 1, does that mean I’d be running at about 25% of max capacity?
This will need confirming since I’m not sure the general Linux load average matches the shown Firecracker number. But:
With a quad-core system, if you had a load average greater than 4.0 that would indicate all cores are at 100% capacity, and any overload will result in processes waiting for CPU time.
So 1 would be fine. Source: