Sprite Disk I/O Report: Severe fsync Latency
Summary
Experiencing very slow disk operations across multiple sprites. Any operation requiring durable writes (git, package installs, config writes, login) is consistently slow – 3+ seconds for simple operations. The issue has been getting worse over time.
Environment
-
Affected sprites:
do-svsg,chorus-test-worker(and others) -
Filesystem: overlay on ext4 (
/dev/vdb, virtio-blk) -
Kernel: Firecracker microVM, virtio-mmio transport
-
I/O scheduler: none
Diagnostic Results
Buffered I/O is fine
| Test | Result |
|---|---|
| Sequential write (100MB, dd) | 756 MB/s |
| Sequential read (100MB, dd) | 4.1 GB/s |
| 100 small file creates (no sync) | 7ms |
| 100 small file reads | 152ms |
fsync is extremely slow
10 consecutive fsync operations took 9.56 seconds total.
Individual fsync latencies (write one byte + sync):
| Call | Latency |
|---|---|
| 1 | 3,678 ms |
| 2 | 558 ms |
| 3 | 184 ms |
| 4 | 1,730 ms |
| 5 | 221 ms |
Expected: <1ms on NVMe storage. Observed: 184ms - 3,678ms (200x-3600x slower than expected).
Additional observations
-
virtio_balloon: Out of puff! Can't get 1 pagesappearing repeatedly in dmesg -
Load average spiked to 2.86 shortly after boot (1-min avg), suggesting I/O wait from background processes
-
No I/O scheduler configured on any block device
Impact
fsync is called by virtually every tool that writes data durably:
-
git – status, commit, index updates
-
package managers – npm, pip, apt
-
databases – SQLite, Postgres
-
editors/IDEs – saving files
-
shell – writing history, config
This makes the sprites feel sluggish for all interactive use, not just specific workloads. A simple git status on a small repo takes 3+ seconds.
Reproduction
for i in 1 2 3 4 5; do
t=$(date +%s%N)
echo "x" > /tmp/sync_test && sync
t2=$(date +%s%N)
echo "fsync $i: $(( (t2 - t) / 1000000 )) ms"
done
Copy
Expected: <10ms per call. Actual: hundreds to thousands of ms.