internal server error on Postgres health check

Hello,

I see from time to time a 500 Internal Server Error in the Postgres health check. As this seems to be the CPU, it might affect the Health Check.

So questions:

  1. Could the Internal Server Error be a result of the CPU utilisation? Or is something broken here?
  2. The health check update is from 15m ago. Shouldn’t that happen more frequently?
  3. Any ideas where to look for regarding the CPU utilisation?

Hi @mathiasn!

  1. Yes, I believe that the internal server error is just due to the failing check and doesn’t indicate a problem with anything else.

  2. AFAIK the “last updated” column shows the last time that the health check changed status. (Admittedly, this definition of “updated” is somewhat misleading!) The health checks do indeed run more frequently.

For 3, I’m not a PG expert so I’m not great at troubleshooting this kind of thing (besides just connecting via SSH and running top), but since the checks say 44 open connections, I suspect it may just be due to load. One thing to try might be to check pg_stat_activity to look for any long-running queries that could be causing it, so something like

SELECT now() - query_start AS query_time, state, datname, query
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY query_time DESC;

There’s also the pg_stat_statements module that can help identify the most frequent queries and how much time they’re taking, but you’d have to install it first.

Hope some of this helps!

Thank you.

  1. Indeed, looks like it. Not very intuitive though. But now I know.
  2. Looks like these checks run every 15s… So yeah “last updated” is misleading. Or at least I understood it differently.

For the last point: I checked that already actually and everything looks fine. I will dig deeper.

Thanks!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.