Issues with Postgres connection timeout

Hi,

We’re currently having recurring issues with our postgres machine. We’ve tried scaling the machine and playing around with the configuration. It seems to be failing every 7~ hours and we can’t seem to find a solution. Health checks are passing. Any help appreciated.

Here is machine config:

Machine ID: d8dd671ce653d8
Instance ID: 01HXRA89CMM4FQN2ZQSECNNA11
State: started

VM
  ID            = d8dd671ce653d8
  Instance ID   = 01HXRA89CMM4FQN2ZQSECNNA11
  State         = started
  Image         = flyio/postgres-flex:15.6 (v0.0.51)
  Name          = spring-sunset-4966
  Private IP    = fdaa:5:8e54:a7b:18:96db:5d92:2
  Region        = lhr
  Process Group =
  CPU Kind      = performance
  vCPUs         = 1
  Memory        = 2048
  Created       = 2024-05-03T10:00:58Z
  Updated       = 2024-05-13T06:29:19Z
  Entrypoint    =
  Command       =
  Volume        = vol_40k9zejm8zw9z7mv

Event Logs
STATE  	EVENT 	SOURCE	TIMESTAMP                    	INFO
started	start 	flyd  	2024-05-13T07:29:19.336+01:00	
created	launch	user  	2024-05-13T07:29:16.849+01:00	


Config:
{
  "env": {
    "PRIMARY_REGION": "lhr"
  },
  "init": {},
  "guest": {
    "cpu_kind": "performance",
    "cpus": 1,
    "memory_mb": 2048
  },
  "metadata": {
    "fly-managed-postgres": "true",
    "fly_flyctl_version": "0.2.47",
    "fly_platform_version": "v2",
    "managed-by-fly-deploy": "true"
  },
  "mounts": [
    {
      "encrypted": true,
      "path": "/data",
      "size_gb": 10,
      "volume": "vol_40k9zejm8zw9z7mv",
      "name": "pg_data"
    }
  ],
  "services": [
    {
      "protocol": "tcp",
      "internal_port": 5432,
      "autostop": false,
      "autostart": false,
      "ports": [
        {
          "port": 5432,
          "handlers": [
            "pg_tls"
          ]
        }
      ],
      "concurrency": {
        "type": "connections",
        "hard_limit": 1000,
        "soft_limit": 1000
      },
      "force_instance_key": null
    },
    {
      "protocol": "tcp",
      "internal_port": 5433,
      "autostop": false,
      "autostart": false,
      "ports": [
        {
          "port": 5433,
          "handlers": [
            "pg_tls"
          ]
        }
      ],
      "concurrency": {
        "type": "connections",
        "hard_limit": 1000,
        "soft_limit": 1000
      },
      "force_instance_key": null
    }
  ],
  "metrics": {
    "port": 9187,
    "path": "/metrics"
  },
  "checks": {
    "pg": {
      "port": 5500,
      "type": "http",
      "interval": "15s",
      "timeout": "10s",
      "path": "/flycheck/pg"
    },
    "role": {
      "port": 5500,
      "type": "http",
      "interval": "15s",
      "timeout": "10s",
      "path": "/flycheck/role"
    },
    "vm": {
      "port": 5500,
      "type": "http",
      "interval": "15s",
      "timeout": "10s",
      "path": "/flycheck/vm"
    }
  },
  "image": "flyio/postgres-flex:15.6@sha256:92917e5770cef6666dddbf17061c2f95b9e19b8155be9a8ce8c35e09e5381167",
  "restart": {
    "policy": "on-failure"
  }
}

Here are logs. When it fails we immediately manually restart it:

2024-05-14 19:14:41.762	repmgrd  | [2024-05-14 18:14:41] [NOTICE] monitoring cluster primary "fdaa:5:8e54:a7b:18:96db:5d92:2" (ID: 374181770)
2024-05-14 19:14:41.762	repmgrd  | [2024-05-14 18:14:41] [INFO] "connection_check_type" set to "ping"
2024-05-14 19:14:41.762	repmgrd  | [2024-05-14 18:14:41] [NOTICE] starting monitoring of node "fdaa:5:8e54:a7b:18:96db:5d92:2" (ID: 374181770)
2024-05-14 19:14:41.762	repmgrd  | INFO:  set_repmgrd_pid(): provided pidfile is /tmp/repmgrd.pid
2024-05-14 19:14:41.744	repmgrd  | [2024-05-14 18:14:41] [INFO] connecting to database "host=fdaa:5:8e54:a7b:18:96db:5d92:2 port=5433 user=repmgr dbname=repmgr connect_timeout=5"
2024-05-14 19:14:41.744	repmgrd  | [2024-05-14 18:14:41] [NOTICE] repmgrd (repmgrd 5.3.3) starting up
2024-05-14 19:14:41.706	repmgrd  | Running...
2024-05-14 19:14:38.688	proxy    | [WARNING]  (408) : Server bk_db/pg1 administratively READY thanks to valid DNS answer.
2024-05-14 19:14:38.688	proxy    | [WARNING]  (408) : Server bk_db/pg1 ('lhr.openformat-db-production.internal') is UP/READY (resolves again).
2024-05-14 19:14:38.688	proxy    | [WARNING]  (408) : bk_db/pg1 changed its IP from (none) to fdaa:5:8e54:a7b:18:96db:5d92:2 by flydns/dns1.
2024-05-14 19:14:38.687	proxy    | [NOTICE]   (406) : Loading success.
2024-05-14 19:14:38.687	proxy    | [NOTICE]   (406) : New worker (408) forked
2024-05-14 19:14:38.645	proxy    | Running...
2024-05-14 19:14:37.645	proxy    | restarting in 1s [attempt 1]
2024-05-14 19:14:37.645	proxy    | Process exited 0
2024-05-14 19:14:37.643	proxy    | [WARNING]  (357) : All workers exited. Exiting... (0)
2024-05-14 19:14:37.643	proxy    | [ALERT]    (357) : Current worker (377) exited with code 143 (Terminated)
2024-05-14 19:14:37.643	proxy    | [NOTICE]   (357) : path to executable is /usr/sbin/haproxy
2024-05-14 19:14:37.643	proxy    | [NOTICE]   (357) : haproxy version is 2.8.9-1~bpo12+1
2024-05-14 19:14:37.533	Voting member(s): 1, Active: 1, Inactive: 0, Conflicts: 0
2024-05-14 19:14:36.883	Machine started in 1.139s
2024-05-14 19:14:36.778	postgres | 2024-05-14 18:14:36.777 UTC [341] LOG:  database system is ready to accept connections
2024-05-14 19:14:36.767	postgres | 2024-05-14 18:14:36.767 UTC [382] LOG:  database system was shut down at 2024-05-14 18:14:35 UTC
2024-05-14 19:14:36.764	postgres | 2024-05-14 18:14:36.764 UTC [341] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5433"
2024-05-14 19:14:36.762	postgres | 2024-05-14 18:14:36.762 UTC [341] LOG:  listening on IPv6 address "::", port 5433
2024-05-14 19:14:36.762	postgres | 2024-05-14 18:14:36.762 UTC [341] LOG:  listening on IPv4 address "0.0.0.0", port 5433
2024-05-14 19:14:36.762	postgres | 2024-05-14 18:14:36.762 UTC [341] LOG:  starting PostgreSQL 15.6 (Debian 15.6-1.pgdg120+2) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
2024-05-14 19:14:36.710	proxy    | [NOTICE]   (357) : Loading success.
2024-05-14 19:14:36.710	proxy    | [NOTICE]   (357) : New worker (377) forked
2024-05-14 19:14:36.705	repmgrd  | restarting in 5s [attempt 1]
2024-05-14 19:14:36.705	repmgrd  | exit status 6
2024-05-14 19:14:36.704	repmgrd  |   user=repmgr connect_timeout=5 dbname=repmgr host=fdaa:5:8e54:a7b:18:96db:5d92:2 port=5433 fallback_application_name=repmgr options=-csearch_path=
2024-05-14 19:14:36.704	repmgrd  | [2024-05-14 18:14:36] [DETAIL] attempted to connect using:
2024-05-14 19:14:36.704	repmgrd  |
2024-05-14 19:14:36.704	repmgrd  | 	Is the server running on that host and accepting TCP/IP connections?
2024-05-14 19:14:36.704	repmgrd  | connection to server at "fdaa:5:8e54:a7b:18:96db:5d92:2", port 5433 failed: Connection refused
2024-05-14 19:14:36.704	repmgrd  | [2024-05-14 18:14:36] [DETAIL]
2024-05-14 19:14:36.704	repmgrd  | [2024-05-14 18:14:36] [ERROR] connection to database failed
2024-05-14 19:14:36.695	repmgrd  | [2024-05-14 18:14:36] [INFO] connecting to database "host=fdaa:5:8e54:a7b:18:96db:5d92:2 port=5433 user=repmgr dbname=repmgr connect_timeout=5"
2024-05-14 19:14:36.695	repmgrd  | [2024-05-14 18:14:36] [NOTICE] repmgrd (repmgrd 5.3.3) starting up
2024-05-14 19:14:36.626	monitor  | Pruning every 24h0m0s...
2024-05-14 19:14:36.618	monitor  | Monitoring replication slots
2024-05-14 19:14:36.618	monitor  | Monitoring cluster state
2024-05-14 19:14:36.618	monitor  | Monitoring dead members
2024-05-14 19:14:36.525	admin    | Running...
2024-05-14 19:14:36.525	exporter | Running...
2024-05-14 19:14:36.523	proxy    | Running...
2024-05-14 19:14:36.522	repmgrd  | Running...
2024-05-14 19:14:36.515	monitor  | Running...
2024-05-14 19:14:36.514	postgres | Running...
2024-05-14 19:14:36.353	2024/05/14 18:14:36 INFO SSH listening listen_address=[fdaa:5:8e54:a7b:18:96db:5d92:2]:22 dns_server=[fdaa::3]:53
2024-05-14 19:14:36.347	 INFO [fly api proxy] listening at /.fly/api
2024-05-14 19:14:36.340	 INFO Preparing to run: `docker-entrypoint.sh start` as root
2024-05-14 19:14:36.339	 INFO Resized /data to 10720641024 bytes
2024-05-14 19:14:36.336	 INFO Mounting /dev/vdb at /data w/ uid: 0, gid: 0 and chmod 0755
2024-05-14 19:14:36.266	 INFO Starting init (commit: d772ddd9)...
2024-05-14 19:14:36.076	[    0.164970] PCI: Fatal: No config space access function found
2024-05-14 19:14:36.026	[    0.156061] Spectre V2 : WARNING: Unprivileged eBPF is enabled with eIBRS on, data leaks possible via Spectre v2 BHB attacks!
2024-05-14 19:14:35.122	[34141.938099] reboot: Restarting system
2024-05-14 19:14:35.118	 INFO Umounting /dev/vdb from /data
2024-05-14 19:14:35.117	 INFO Starting clean up.
2024-05-14 19:14:35.103	 INFO Main child exited normally with code: 0
2024-05-14 19:14:35.101	postgres | Process exited 0
2024-05-14 19:14:35.100	postgres | 2024-05-14 18:14:35.100 UTC [342] LOG:  database system is shut down
2024-05-14 19:14:35.091	postgres | 2024-05-14 18:14:35.091 UTC [381] LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.002 s, sync=0.001 s, total=0.004 s; sync files=0, longest=0.000 s, average=0.000 s; distance=0 kB, estimate=5 kB
2024-05-14 19:14:35.088	repmgrd  | Process exited 0
2024-05-14 19:14:35.087	postgres | 2024-05-14 18:14:35.087 UTC [381] LOG:  checkpoint starting: shutdown immediate
2024-05-14 19:14:35.087	repmgrd  | [2024-05-14 18:14:35] [INFO] repmgrd terminating...
2024-05-14 19:14:35.087	repmgrd  | SELECT pg_catalog.pg_is_in_recovery()
2024-05-14 19:14:35.087	repmgrd  | [2024-05-14 18:14:35] [DETAIL] query text is:
2024-05-14 19:14:35.087	repmgrd  |
2024-05-14 19:14:35.087	repmgrd  | 	before or while processing the request.
2024-05-14 19:14:35.087	repmgrd  | 	This probably means the server terminated abnormally
2024-05-14 19:14:35.087	repmgrd  | server closed the connection unexpectedly
2024-05-14 19:14:35.087	repmgrd  | FATAL:  terminating connection due to administrator command
2024-05-14 19:14:35.087	repmgrd  | [2024-05-14 18:14:35] [DETAIL]
2024-05-14 19:14:35.087	repmgrd  | [2024-05-14 18:14:35] [ERROR] unable to determine if server is in recovery
2024-05-14 19:14:35.087	postgres | 2024-05-14 18:14:35.087 UTC [381] LOG:  shutting down
2024-05-14 19:14:35.087	postgres | 2024-05-14 18:14:35.085 UTC [438] FATAL:  terminating connection due to administrator command
2024-05-14 19:14:35.087	postgres | 2024-05-14 18:14:35.084 UTC [424] FATAL:  terminating connection due to administrator command
2024-05-14 19:14:35.087	postgres | 2024-05-14 18:14:35.083 UTC [342] LOG:  background worker "logical replication launcher" (PID 386) exited with exit code 1
2024-05-14 19:14:35.087	postgres | 2024-05-14 18:14:35.080 UTC [342] LOG:  aborting any active transactions
2024-05-14 19:14:35.087	proxy    | [WARNING]  (405) : All workers exited. Exiting... (130)
2024-05-14 19:14:35.087	proxy    | [ALERT]    (405) : Current worker (407) exited with code 130 (Interrupt)
2024-05-14 19:14:35.087	proxy    | exit status 130
2024-05-14 19:14:35.087	exporter | signal: interrupt
2024-05-14 19:14:35.087	admin    | signal: interrupt
2024-05-14 19:14:35.080	repmgrd  | [2024-05-14 18:14:35] [NOTICE] INT signal received
2024-05-14 19:14:35.080	postgres | 2024-05-14 18:14:35.078 UTC [342] LOG:  received fast shutdown request
2024-05-14 19:14:35.080	monitor  | signal: interrupt
2024-05-14 19:14:35.080	proxy    | [WARNING]  (405) : Exiting Master process...
2024-05-14 19:14:35.080	proxy    | [NOTICE]   (405) : path to executable is /usr/sbin/haproxy
2024-05-14 19:14:35.080	proxy    | [NOTICE]   (405) : haproxy version is 2.8.9-1~bpo12+1
2024-05-14 19:14:35.080	monitor  | Stopping interrupt...
2024-05-14 19:14:35.080	admin    | Stopping interrupt...
2024-05-14 19:14:35.080	repmgrd  | Stopping interrupt...
2024-05-14 19:14:35.080	exporter | Stopping interrupt...
2024-05-14 19:14:35.080	proxy    | Stopping interrupt...
2024-05-14 19:14:35.077	postgres | Stopping interrupt...
2024-05-14 19:14:35.077	Supervisor stopping
2024-05-14 19:14:35.077	Got interrupt, stopping
2024-05-14 19:14:35.077	 INFO Sending signal SIGINT to main child process w/ PID 323
2024-05-14 19:14:04.677	repmgrd  | [2024-05-14 18:14:04] [INFO] monitoring primary node "fdaa:5:8e54:a7b:18:96db:5d92:2" (ID: 374181770) in normal state
2024-05-14 19:10:33.906	monitor  | Voting member(s): 1, Active: 1, Inactive: 0, Conflicts: 0
2024-05-14 19:09:02.957	repmgrd  | [2024-05-14 18:09:02] [INFO] monitoring primary node "fdaa:5:8e54:a7b:18:96db:5d92:2" (ID: 374181770) in normal state
2024-05-14 19:05:33.905	monitor  | Voting member(s): 1, Active: 1, Inactive: 0, Conflicts: 0
2024-05-14 19:04:01.185	repmgrd  | [2024-05-14 18:04:01] [INFO] monitoring primary node "fdaa:5:8e54:a7b:18:96db:5d92:2" (ID: 374181770) in normal state
2024-05-14 19:00:33.905	monitor  | Voting member(s): 1, Active: 1, Inactive: 0, Conflicts: 0
2024-05-14 18:58:59.393	repmgrd  | [2024-05-14 17:58:59] [INFO] monitoring primary node "fdaa:5:8e54:a7b:18:96db:5d92:2" (ID: 374181770) in normal state
2024-05-14 18:55:33.904	monitor  | Voting member(s): 1, Active: 1, Inactive: 0, Conflicts: 0
2024-05-14 18:53:57.727	repmgrd  | [2024-05-14 17:53:57] [INFO] monitoring primary node "fdaa:5:8e54:a7b:18:96db:5d92:2" (ID: 374181770) in normal state
2024-05-14 18:50:33.903	monitor  | Voting member(s): 1, Active: 1, Inactive: 0, Conflicts: 0
2024-05-14 18:48:56.041	repmgrd  | [2024-05-14 17:48:56] [INFO] monitoring primary node "fdaa:5:8e54:a7b:18:96db:5d92:2" (ID: 374181770) in normal state

From General to Questions / Help

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.