We’ve been experiencing severe performance degradation on several of our Fly.io Firecracker VM instances (8 vCPU / 8 GB RAM). After extensive investigation, we’ve identified the root cause: the virtio-balloon is inflating to ~80% of guest RAM, leaving only 1.4–2 GB usable out of 8 GB.
Summary
-
VM spec: 8 vCPU (AMD EPYC), 8 GB RAM, no swap
-
Symptom: Load average spikes to 20–30+, all application requests degraded to 30–100s response times
-
Root cause: virtio-balloon inflates to ~6.5 GB, leaving insufficient memory for page cache. This triggers a kswapd thrashing loop with ~2 TB of disk re-reads in just 2 hours.
Evidence
Balloon inflation (from /proc/vmstat)
Affected instance:
balloon_inflate: 1,708,800 pages (6,683 MB given to host)
balloon_deflate: 37,632 pages ( 147 MB returned)
NET inflated: 1,671,168 pages = 6,528 MB stolen from guest
Healthy instance for comparison:
balloon_inflate: 1,529,600 pages
balloon_deflate: 512 pages
NET inflated: 1,529,088 pages = 5,973 MB stolen from guest
Both instances show 75–82% of VM RAM taken by the balloon. The “healthy” one only survives because it has slightly more headroom (1,968 MB vs 1,413 MB usable).
Memory accounting
| Metric | Affected VM | Healthy VM |
|---|---|---|
| MemTotal | 7,941 MB | 7,941 MB |
| Balloon stolen | 6,528 MB | 5,973 MB |
| Actually usable | 1,413 MB | 1,968 MB |
| Application RSS | ~325 MB | ~337 MB |
| Left for page cache | ~1,000 MB | ~1,500 MB |
Thrashing indicators
Affected vs healthy instance:
| Metric | Healthy (7h uptime) | Affected (2h uptime) | Factor |
|---|---|---|---|
| pgmajfault | 14,727 | 8,742,890 | 594x |
| allocstall | 81 | 2,885,556 | 35,600x |
| pgscan_kswapd | 3,990,861 | 604,998,966 | 152x |
| Total disk reads | ~12 GB | ~1,982 GB | 165x |
| workingset_refault_file | 2,864,367 | 335,447,011 | 117x |
The kernel dmesg also shows a kswapd BUG/crash in shrink_slab / balance_pgdat — the kernel hit a fault while trying to reclaim memory under extreme pressure.
Impact on application performance
During the thrashing period, WebSocket API requests degraded severely:
API call A: 101,454 ms (normally <1s)
API call B: 55,214 ms (normally <5s)
API call C: 1,036 ms (normally ~50ms)
Why the working set doesn’t fit
Our application’s on-disk footprint is ~1.2 GB (Node.js with native modules including ML model weights at ~677 MB). Combined with the Node.js runtime, system libraries (/usr/lib at 523 MB), and operational data, the working set is ~2 GB. With only 1.4 GB usable after balloon inflation, the page cache is constantly evicted and re-read from the overlay filesystem layers.
The overlay root filesystem is backed by 3 read-only virtio block devices (vdd, vde, vdf at 8 GB each), and the repeated re-reads of these layers account for the ~2 TB of I/O in just 2 hours.
Balloon configuration issue
The virtio-balloon device (virtio0, device_id=0x0005) has the following negotiated features:
Features: 0x6000000080000000
Bits set: 31 (VIRTIO_F_ACCESS_PLATFORM), 61, 62
Critically NOT negotiated:
-
DEFLATE_ON_OOM (bit 2): The balloon does not auto-shrink when the guest is under memory pressure
-
STATS_VQ (bit 1): The host cannot query guest memory stats to make informed decisions
-
FREE_PAGE_HINT (bit 5): Not active for dynamic adjustment
This means once the balloon inflates, the guest has no mechanism to reclaim memory even when it is thrashing to death.
Questions / Requests
-
Is this level of balloon inflation (75–82%) intentional? On most cloud platforms, VMs receive close to their advertised RAM. Leaving only 1.4–2 GB usable out of 8 GB seems extremely aggressive.
-
Can DEFLATE_ON_OOM be enabled? This would allow the balloon to automatically shrink when the guest is under memory pressure, preventing the thrashing death spiral.
-
Can STATS_VQ be enabled? This would let the hypervisor make smarter balloon sizing decisions based on actual guest memory utilization.
-
Alternatively, could VMs be allocated more physical memory to account for balloon overhead? For example, allocating 16 GB so that ~8 GB remains usable after balloon.
How to reproduce
On any Firecracker VM with the balloon driver:
-
cat /proc/vmstat | grep balloon— check NET inflated pages -
free -m— observe actual available memory vs MemTotal -
Run any workload with a ~2 GB working set and observe load average spike as page cache is exhausted
Happy to provide additional diagnostics. Thanks for looking into this.