Managed postgres cluster stuck at 100% cpu with no queries or activity

I’ve very recently started using managed postgres. Last night, for the second time, cpu usage shot up from ~0 to 100% very suddenly, and stayed there. Performance is terrible in this state, many queries time out.

None of the apps connected to the database had any queries or activity at the time this started.

In fact, every app connected to this database is currently scaled to 0. There is no activity whatsoever, and no connections from the app. And cpu usage is still at 100%!

When in this state, connecting to the database with fly mpg connect or proxy often does not succeed, you just get `Error: tunnel unavailable for organization personal: Error contacting Fly.io API when probing “personal”: timed out (context deadline exceeded)` . But when it does succeed, pg_stat_activity shows 11 rows, which (other than the psql instance itself) are presumably all internal fly things (_crunchyrepl, fly-monitoring, “Patroni heartbeat”, …).

The only activity shown in the mpg logs are a high rate of ssh session from admin-bot@fly.io, multiple per second:

14:28:33 hallpass: 2026/04/04 14:28:32 INFO New SSH session email=admin-bot@fly.io verified=false
14:28:33 hallpass: 2026/04/04 14:28:33 INFO New SSH session email=admin-bot@fly.io verified=false
14:28:33 hallpass: 2026/04/04 14:28:33 INFO New SSH session email=admin-bot@fly.io verified=false
14:28:33 hallpass: 2026/04/04 14:28:33 INFO New SSH session email=admin-bot@fly.io verified=false

Any suggestions?

Edit: memory usage per the fly dashboard is 619MiB on a 1GB plan, so thrashing seems unlikely. Being mpg I can’t ssh to the instance and look at the metrics myself, which is rather frustrating.

Mpg is starting to look like the worst of both worlds (you don’t have sufficient access to be able to debug database issues yourself, but you also can’t escalate to a fly engineer without paying more for a support plan, so you’re just stuck in the middle…)

Hi… You actually can contact Fly Support with questions if they’re specific to Managed Postgres; support comes bundled with that, so you don’t really need to pay for a separate plan:

Organizations with a Managed Postgres (MPG) database cluster also have access to our Support Portal for issues related to MPG.

[…]

You can access it from your Fly.io dashboard by clicking the Support tab.

(This is distinct from the Community Support item in the dropdown menu, which instead leads here, to the community forum. I think the two different “Support” links may be a cause of a lot of confusion among users, in general…)

1 Like

Aha, right you are, thank you!

I’d missed that because the only support link visible from the /managed_postgres/ dashboard is the community support link (under resources). To get to managed postgres paid support you have to come out of the mpg dashboard to the top-level dashboard, and click support from there. Unintuitive indeed.

I’ve submitted a ticket there, fingers crossed…

1 Like

I had a very similar issue with my cluster about two weeks ago, and it’s indeed quite frustrating not having enough access to perform "low-level“ operations on the cluster.

When I reached out to support, they identified that the machine was being throttled, but that kind of visibility isn’t available to us as end users.

In the end, I found out that it was actually our fault. We ran a query to delete some data, around 200k rows, and that created a bottleneck. Although, in my opinion, that’s not a large enough volume that should have caused throttling.

The workaround that solved it for me was scaling the disk of the cluster from 10 to 11GB. That forced the machines to restart and the throttling stopped. After that, I proceeded to purge the data in a more controlled way.

Anyway, yes, I agree that in the current MPG model we’re stuck in the worst of both worlds. I didn’t even have access to terminate the connection that was running that long query :smiling_face_with_tear:

1 Like

Does pg_cancel_backend or pg_terminate_backend work in MPG? I’d have thought one could kill their own long-running queries.

@halfer Unfortunately not, the mpg user can’t terminate even their own connections :confused:

Ooh, might be worth making a new top-level post here, marking that as a feature request. If it’s not possible with those functions due to the managed architecture, I’m sure they could add something on the web console or the cli.

2 Likes