I’ve set up Betterstack logging with my Fly.io apps, and my production DB is returning with the following every 15 seconds or so: FATAL: password authentication failed for user \"postgres\". This is not happening with my test database which has no replicas, and the error is only coming from the primary node. The average load metric is also higher with a lot of transactions showing in PGAdmin. This is despite my production environment not actually having any traffic yet as I’m just getting it ready for release, so load should be way down compared to my test environment.
I’m a bit lost as to what is causing these logs. Is it a replica? It happens even if I spin up an entirely new postgres app with a couple of replicas.
To follow up on this, I decided to start another Postgres instance from scratch from a single machine. With a single machine, no errors. Scaling up to three also returns no errors. So, I can only deduce this to be an issue when you use flyctl and create a High Availability Postgres app with more than one machine. Cloning and scaling manually works fine.
That is the interval for its health checks, by default, so that provides us with at least one hypothesis. The full configuration (intervals, timeouts, paths) is in the checks subtree of…
fly config show -a db-app-name
These are just GET requests to a tiny HTTP server that the machine maintains, so you can introduce them ahead of schedule via…
fly m list -a db-app-name # consult the IP ADDRESS column of the primary
fly proxy 5500 fdaa:<hex-digits-from-above>:2
And then in a separate terminal…
curl http://127.0.0.1:5500/flycheck/pg
…as many times as necessary to observe a frequency change in the logs (or not!).
Thanks, @mayailurus, your reply has been helpful although it’s not solved my issue. I also started to have other logs mentioning “pg” and “psql” roles, which I found odd. Closing PGAdmin seemed to remove these extra logs, but there were still some happening at 4am using the “postgres” role.
I’m going to continue to monitor this overnight and see what’s happening. With a bit of luck, I won’t have these logs being sent to my logging service anymore!