Sprites fsync issues

Sprite Disk I/O Report: Severe fsync Latency

Summary

Experiencing very slow disk operations across multiple sprites. Any operation requiring durable writes (git, package installs, config writes, login) is consistently slow – 3+ seconds for simple operations. The issue has been getting worse over time.

Environment

  • Affected sprites: do-svsg, chorus-test-worker (and others)

  • Filesystem: overlay on ext4 (/dev/vdb, virtio-blk)

  • Kernel: Firecracker microVM, virtio-mmio transport

  • I/O scheduler: none

Diagnostic Results

Buffered I/O is fine

Test Result
Sequential write (100MB, dd) 756 MB/s
Sequential read (100MB, dd) 4.1 GB/s
100 small file creates (no sync) 7ms
100 small file reads 152ms

fsync is extremely slow

10 consecutive fsync operations took 9.56 seconds total.

Individual fsync latencies (write one byte + sync):

Call Latency
1 3,678 ms
2 558 ms
3 184 ms
4 1,730 ms
5 221 ms

Expected: <1ms on NVMe storage. Observed: 184ms - 3,678ms (200x-3600x slower than expected).

Additional observations

  • virtio_balloon: Out of puff! Can't get 1 pagesappearing repeatedly in dmesg

  • Load average spiked to 2.86 shortly after boot (1-min avg), suggesting I/O wait from background processes

  • No I/O scheduler configured on any block device

Impact

fsync is called by virtually every tool that writes data durably:

  • git – status, commit, index updates

  • package managers – npm, pip, apt

  • databases – SQLite, Postgres

  • editors/IDEs – saving files

  • shell – writing history, config

This makes the sprites feel sluggish for all interactive use, not just specific workloads. A simple git status on a small repo takes 3+ seconds.

Reproduction

for i in 1 2 3 4 5; do
  t=$(date +%s%N)
  echo "x" > /tmp/sync_test && sync
  t2=$(date +%s%N)
  echo "fsync $i: $(( (t2 - t) / 1000000 )) ms"
done
Copy

Expected: <10ms per call. Actual: hundreds to thousands of ms.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.