Possible network problem (periodic spiking latency) in LHR/on host 81b8?

Whistler · November 14, 2022, 12:06pm

I now suspect it isn’t a network problem but some sort of CPU locking/IO wait problem (possibly only on shared CPUs?)…

If I run (i.e. to disk):

top -b -d 1 > /root/top.txt

I’ve noticed that missing top data correlates with the periods of the problem:

#!/bin/bash
mapfile -t logs_arr < <( grep "top - " /root/top.txt | cut -d " " -f 3 )
prev_log_seconds="00"
line_num=1
for log_line in "${logs_arr[@]}"
do
    current_log_seconds=$(echo $log_line | cut -d ":" -f 3)
    seconds_diff=$((10#$current_log_seconds-10#$prev_log_seconds));
    prev_log_seconds=$current_log_seconds
    if [ $line_num != 1 ] && [ $seconds_diff != -59 ] && [ $seconds_diff != 1 ] && [ $seconds_diff != 2 ]; then
        echo $line_num $seconds_diff $log_line
    fi
    line_num=$((line_num+1))
done

I then ran (log to ramdisk?):

top -b -d 1 > /dev/shm/top.txt

and didn’t miss top data for the problem periods but I can now see 100.0 wa in those times:

grep -B 2 "100.0 wa" /dev/shm/top.txt

The 81b8 VM I’m debugging on is the one that’s not doing much (if anything). Per Understanding FIRECRACKER LOAD AVERAGE (Part Deux) - shared CPU I’m not sure how the metrics top provides can be expected to differ in a Firecracker environment.

That this correlates with my observed problems, and only on 81b8, it would be good if someone could perform the tests above on their shared CPU VM(s) on LHR host 81b8.

Unhelpfully it doesn’t appear as bad this morning as last night - where I think I had at least one 14 second bout.

Topic		Replies	Views
Is Fly.io slower today?	26	933	September 1, 2022
fly.io instance response times Questions / Help metrics , troubleshooting , proxy	24	607	February 18, 2025
Application slow for a single user... how? Questions / Help machines	17	204	August 9, 2024
Something went wrong? Questions / Help	42	1506	September 22, 2022
Elevated error rates	20	1238	July 22, 2021

Possible network problem (periodic spiking latency) in LHR/on host 81b8?

Related topics