Health check for your postgres vm has failed. Your instance has hit resource limits. Upgrading your instance / volume size or reducing your usage might help. [✗] cpu: system spent 1.09s of the last 10 seconds waiting on cpu (30.91µs)
I’m seeing this showing up in my logs after noticing a performance decrease in my app. I’m not sure what the issue is, as the volume is less than 50% used.
Instance has been restarted, but the issue persists.
Identified - We have identified a bad commit that has disrupted some of our platform operations and are working to roll back quickly.
Aug 07, 2024 - 19:11 UTC
Can you please share a machine ID from your app so I can look it up and diagnose further? Ideally, the machine from which you got that load avg graph you showed earlier.
FWIW I don’t think it’s related to the incident you mentioned
Hi Daniel – machine ID is 3d8d3e5bee6d18. Managed to restart it once, but not again - stuck waiting on lease. Logs now say the database is malfunctioning.
So, unfortunately the restart was probably affected by the incident. If possible try restarting the machine again, now that the incident is solved.
If you care about data availability it’s usually a good idea to have three Postgres machines in a cluster, that way if one goes down the remaining two can still provide service.
I think I know what’s going on, stand by, I’ll report back in a few minutes.
Thanks Daniel – restart worked this time. I can certainly add machines in time,when the service I’m running starts getting some money in. Right now, it’s free.
A single-machine database is free at that scale but how much is your data worth
On a more serious note - we spotted some suspicious activity on the host where your machine lives, we’ve cleared that out so your database should now respond more quickly and the CPU health check should also clear up.