I’ve very recently started using managed postgres. Last night, for the second time, cpu usage shot up from ~0 to 100% very suddenly, and stayed there. Performance is terrible in this state, many queries time out.
None of the apps connected to the database had any queries or activity at the time this started.
In fact, every app connected to this database is currently scaled to 0. There is no activity whatsoever, and no connections from the app. And cpu usage is still at 100%!
When in this state, connecting to the database with fly mpg connect or proxy often does not succeed, you just get `Error: tunnel unavailable for organization personal: Error contacting Fly.io API when probing “personal”: timed out (context deadline exceeded)` . But when it does succeed, pg_stat_activity shows 11 rows, which (other than the psql instance itself) are presumably all internal fly things (_crunchyrepl, fly-monitoring, “Patroni heartbeat”, …).
The only activity shown in the mpg logs are a high rate of ssh session from admin-bot@fly.io, multiple per second:
14:28:33 hallpass: 2026/04/04 14:28:32 INFO New SSH session email=admin-bot@fly.io verified=false
14:28:33 hallpass: 2026/04/04 14:28:33 INFO New SSH session email=admin-bot@fly.io verified=false
14:28:33 hallpass: 2026/04/04 14:28:33 INFO New SSH session email=admin-bot@fly.io verified=false
14:28:33 hallpass: 2026/04/04 14:28:33 INFO New SSH session email=admin-bot@fly.io verified=false
Any suggestions?
Edit: memory usage per the fly dashboard is 619MiB on a 1GB plan, so thrashing seems unlikely. Being mpg I can’t ssh to the instance and look at the metrics myself, which is rather frustrating.
Mpg is starting to look like the worst of both worlds (you don’t have sufficient access to be able to debug database issues yourself, but you also can’t escalate to a fly engineer without paying more for a support plan, so you’re just stuck in the middle…)
