Database erroring way to often than usual?

dedsec · September 21, 2021, 3:59pm

Hello, Our database’s leader has been spending way to much time in critical state.
It is being restarted more than 10 times a day which is more than it was the whole of the last month.
Now some blame is on since we have to move the events thing our of the db asap but It was stable not long ago.
So we need to know what has changed?

also this is the error logging message

Description
HTTP GET http://172.19.0.82:5500/flycheck/vm: 500 Internal Server Error Output: "[✗] system spent 5.5 of the last 10 seconds waiting for cpu
[✓] 8.93 GB (91.4%!)(MISSING) free space on /data/
[✓] load averages: 0.15 0.23 0.04
[✓] memory: 0.0s waiting over the last 60s
[✓] io: 0.0s waiting over the last 60s"
Status
Critical
Entity
database-8330c4ac
Check
vm

Thank!

kurt · September 21, 2021, 4:05pm

That many restarts probably means the DB is overloaded. Have you upgraded RAM or VM size?

Check current RAM with fly vm scale
Run fly vm status <id> of the VM that’s restarting to see a few more details on restarts
fly logs -i <id> will show you the most recent logs, normally when it restarts there will be some errors in logs

dedsec · September 21, 2021, 4:25pm

Thank but it doesn’t make sense we don’t have a lot of traffic and we have upgraded to, here is our current RAM

VM Resources for database
        VM Size: dedicated-cpu-1x
      VM Memory: 2 GB
          Count: 3

Any help?

kurt · September 21, 2021, 4:42pm

It looks like it was restarting because that cpu time health check kept failing. We just deployed a new CPU health check to your database, it’s a little less aggressive (it looks at the last minute instead of 10s). If that fixes things, I think you’re good. If you see more CPU wait time check failures it likely means you need to upgrade your VM to dedicated-cpu-2x.

Topic		Replies	Views
database cluster unstable Questions / Help	3	397	August 17, 2021
Cluster leader failing health checks waiting for CPU Questions / Help	6	546	August 15, 2023
Unexpected Restarts metrics	3	753	September 17, 2020
PostgreSQL Database in Failing State Questions / Help postgres	4	748	July 18, 2022
PosgreSQL on Fly: 1 critical health check	10	632	December 20, 2021

Database erroring way to often than usual?

Related topics