Postgres app claiming instance is hitting resource limits

miker · August 7, 2024, 6:57pm

Health check for your postgres vm has failed. Your instance has hit resource limits. Upgrading your instance / volume size or reducing your usage might help. [✗] cpu: system spent 1.09s of the last 10 seconds waiting on cpu (30.91µs)

I’m seeing this showing up in my logs after noticing a performance decrease in my app. I’m not sure what the issue is, as the volume is less than 50% used.

Instance has been restarted, but the issue persists.

Can someone assist?

IH4xx3R · August 7, 2024, 6:59pm

same problem https://community.fly.io/t/high-steal-cpu-usage-2/21176

miker · August 7, 2024, 7:02pm

Thanks – that time period looks about the same as my instance.

miker · August 7, 2024, 7:05pm

Sorry, actually shorter, but still, out of the blue.

miker · August 7, 2024, 7:12pm

Identified - We have identified a bad commit that has disrupted some of our platform operations and are working to roll back quickly.
Aug 07, 2024 - 19:11 UTC

I assume this is it…

roadmr · August 7, 2024, 7:22pm

Hi @miker,

Can you please share a machine ID from your app so I can look it up and diagnose further? Ideally, the machine from which you got that load avg graph you showed earlier.

FWIW I don’t think it’s related to the incident you mentioned

Daniel

miker · August 7, 2024, 7:27pm

Hi Daniel – machine ID is 3d8d3e5bee6d18. Managed to restart it once, but not again - stuck waiting on lease. Logs now say the database is malfunctioning.

roadmr · August 7, 2024, 7:37pm

Hi @miker,

So, unfortunately the restart was probably affected by the incident. If possible try restarting the machine again, now that the incident is solved.

If you care about data availability it’s usually a good idea to have three Postgres machines in a cluster, that way if one goes down the remaining two can still provide service.

I think I know what’s going on, stand by, I’ll report back in a few minutes.

IH4xx3R · August 7, 2024, 7:38pm

My machine is now spamming (8x/s) this message into the logs and nearly peaked 100% 5min Load Avg :

sentinel | 2024-08-07T19:31:21.629Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "9da7df65", "keeper": "41d10d41dd2"}

machine ID : 9080292a666078

miker · August 7, 2024, 7:41pm

Thanks Daniel – restart worked this time. I can certainly add machines in time,when the service I’m running starts getting some money in. Right now, it’s free.

roadmr · August 7, 2024, 7:45pm

A single-machine database is free at that scale but how much is your data worth

On a more serious note - we spotted some suspicious activity on the host where your machine lives, we’ve cleared that out so your database should now respond more quickly and the CPU health check should also clear up.

Regards,

miker · August 7, 2024, 7:48pm

Well, there is the volume with snapshots, but you’re not wrong at all

Thanks! I’ll monitor for the next few mins and see what happens.

miker · August 7, 2024, 7:54pm

Looks like it’s slowly recovering (CPU-wise) and responses appear to be quicker as well. Thanks for the quick assist!

system · August 14, 2024, 7:55pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
PostgreSQL DB resource limits reached postgres	3	536	August 8, 2023
Health check for your postgres vm has failed. Your instance has hit resource limits. Upgrading your instance / volume size or reducing your usage might help Questions / Help postgres	1	71	February 3, 2025
Database hitting resource limits postgres , troubleshooting , volumes	2	40	April 21, 2025
High steal cpu usage 2 Questions / Help postgres , machines	3	126	August 14, 2024
VM Postgres : 'Cpu: system spent x.xxs of the last 10 seconds waiting on cpu' Questions / Help elixir , postgres , machines	5	659	June 20, 2023

Postgres app claiming instance is hitting resource limits

Related topics