Identifying Memory Pressure in fly-metrics

We’ve introduced a new metric to fly-metrics to identify memory pressure. If you’re unfamiliar with what memory pressure is, the TLDR is that memory pressure notifies programs when a process or multiple processes have been starved of memory within a specified time window. This metric (hopefully) helps us identify if a process is about to OOM.

What does it do?

For now, this metric doesn’t tell us much since OOMs happen very quickly after memory pressure is detected. But! We’re actively experimenting to see how we can better detect and inform you of OOM failures.

How do I enable it?

To enable this metric, simply cut a fresh deploy with fly deploy and you should be able to see the metric in fly-metrics.net

How do I use it?

To check out the metric, you can run fly_instance_memory_pressure_some{app="YOURAPPNAME"} or fly_instance_memory_pressure_full{app="YOURAPPNAME"}

pressure_some represents the pressure for at least one process, while pressure_full represents the pressure for all processes

avg10, avg60, avg300 represents the percentage of time over 10s, 60s and 300s when a process was starved of memory. While total represents the total time in microseconds that processes were starved of memory.

10 Likes