Issues with Postgres connection timeout

tinypellets · May 14, 2024, 6:27pm

Hi,

We’re currently having recurring issues with our postgres machine. We’ve tried scaling the machine and playing around with the configuration. It seems to be failing every 7~ hours and we can’t seem to find a solution. Health checks are passing. Any help appreciated.

Here is machine config:

Machine ID: d8dd671ce653d8
Instance ID: 01HXRA89CMM4FQN2ZQSECNNA11
State: started

VM
  ID            = d8dd671ce653d8
  Instance ID   = 01HXRA89CMM4FQN2ZQSECNNA11
  State         = started
  Image         = flyio/postgres-flex:15.6 (v0.0.51)
  Name          = spring-sunset-4966
  Private IP    = fdaa:5:8e54:a7b:18:96db:5d92:2
  Region        = lhr
  Process Group =
  CPU Kind      = performance
  vCPUs         = 1
  Memory        = 2048
  Created       = 2024-05-03T10:00:58Z
  Updated       = 2024-05-13T06:29:19Z
  Entrypoint    =
  Command       =
  Volume        = vol_40k9zejm8zw9z7mv

Event Logs
STATE  	EVENT 	SOURCE	TIMESTAMP                    	INFO
started	start 	flyd  	2024-05-13T07:29:19.336+01:00	
created	launch	user  	2024-05-13T07:29:16.849+01:00	


Config:
{
  "env": {
    "PRIMARY_REGION": "lhr"
  },
  "init": {},
  "guest": {
    "cpu_kind": "performance",
    "cpus": 1,
    "memory_mb": 2048
  },
  "metadata": {
    "fly-managed-postgres": "true",
    "fly_flyctl_version": "0.2.47",
    "fly_platform_version": "v2",
    "managed-by-fly-deploy": "true"
  },
  "mounts": [
    {
      "encrypted": true,
      "path": "/data",
      "size_gb": 10,
      "volume": "vol_40k9zejm8zw9z7mv",
      "name": "pg_data"
    }
  ],
  "services": [
    {
      "protocol": "tcp",
      "internal_port": 5432,
      "autostop": false,
      "autostart": false,
      "ports": [
        {
          "port": 5432,
          "handlers": [
            "pg_tls"
          ]
        }
      ],
      "concurrency": {
        "type": "connections",
        "hard_limit": 1000,
        "soft_limit": 1000
      },
      "force_instance_key": null
    },
    {
      "protocol": "tcp",
      "internal_port": 5433,
      "autostop": false,
      "autostart": false,
      "ports": [
        {
          "port": 5433,
          "handlers": [
            "pg_tls"
          ]
        }
      ],
      "concurrency": {
        "type": "connections",
        "hard_limit": 1000,
        "soft_limit": 1000
      },
      "force_instance_key": null
    }
  ],
  "metrics": {
    "port": 9187,
    "path": "/metrics"
  },
  "checks": {
    "pg": {
      "port": 5500,
      "type": "http",
      "interval": "15s",
      "timeout": "10s",
      "path": "/flycheck/pg"
    },
    "role": {
      "port": 5500,
      "type": "http",
      "interval": "15s",
      "timeout": "10s",
      "path": "/flycheck/role"
    },
    "vm": {
      "port": 5500,
      "type": "http",
      "interval": "15s",
      "timeout": "10s",
      "path": "/flycheck/vm"
    }
  },
  "image": "flyio/postgres-flex:15.6@sha256:92917e5770cef6666dddbf17061c2f95b9e19b8155be9a8ce8c35e09e5381167",
  "restart": {
    "policy": "on-failure"
  }
}

Here are logs. When it fails we immediately manually restart it:

2024-05-14 19:14:41.762	repmgrd  | [2024-05-14 18:14:41] [NOTICE] monitoring cluster primary "fdaa:5:8e54:a7b:18:96db:5d92:2" (ID: 374181770)
2024-05-14 19:14:41.762	repmgrd  | [2024-05-14 18:14:41] [INFO] "connection_check_type" set to "ping"
2024-05-14 19:14:41.762	repmgrd  | [2024-05-14 18:14:41] [NOTICE] starting monitoring of node "fdaa:5:8e54:a7b:18:96db:5d92:2" (ID: 374181770)
2024-05-14 19:14:41.762	repmgrd  | INFO:  set_repmgrd_pid(): provided pidfile is /tmp/repmgrd.pid
2024-05-14 19:14:41.744	repmgrd  | [2024-05-14 18:14:41] [INFO] connecting to database "host=fdaa:5:8e54:a7b:18:96db:5d92:2 port=5433 user=repmgr dbname=repmgr connect_timeout=5"
2024-05-14 19:14:41.744	repmgrd  | [2024-05-14 18:14:41] [NOTICE] repmgrd (repmgrd 5.3.3) starting up
2024-05-14 19:14:41.706	repmgrd  | Running...
2024-05-14 19:14:38.688	proxy    | [WARNING]  (408) : Server bk_db/pg1 administratively READY thanks to valid DNS answer.
2024-05-14 19:14:38.688	proxy    | [WARNING]  (408) : Server bk_db/pg1 ('lhr.openformat-db-production.internal') is UP/READY (resolves again).
2024-05-14 19:14:38.688	proxy    | [WARNING]  (408) : bk_db/pg1 changed its IP from (none) to fdaa:5:8e54:a7b:18:96db:5d92:2 by flydns/dns1.
2024-05-14 19:14:38.687	proxy    | [NOTICE]   (406) : Loading success.
2024-05-14 19:14:38.687	proxy    | [NOTICE]   (406) : New worker (408) forked
2024-05-14 19:14:38.645	proxy    | Running...
2024-05-14 19:14:37.645	proxy    | restarting in 1s [attempt 1]
2024-05-14 19:14:37.645	proxy    | Process exited 0
2024-05-14 19:14:37.643	proxy    | [WARNING]  (357) : All workers exited. Exiting... (0)
2024-05-14 19:14:37.643	proxy    | [ALERT]    (357) : Current worker (377) exited with code 143 (Terminated)
2024-05-14 19:14:37.643	proxy    | [NOTICE]   (357) : path to executable is /usr/sbin/haproxy
2024-05-14 19:14:37.643	proxy    | [NOTICE]   (357) : haproxy version is 2.8.9-1~bpo12+1
2024-05-14 19:14:37.533	Voting member(s): 1, Active: 1, Inactive: 0, Conflicts: 0
2024-05-14 19:14:36.883	Machine started in 1.139s
2024-05-14 19:14:36.778	postgres | 2024-05-14 18:14:36.777 UTC [341] LOG:  database system is ready to accept connections
2024-05-14 19:14:36.767	postgres | 2024-05-14 18:14:36.767 UTC [382] LOG:  database system was shut down at 2024-05-14 18:14:35 UTC
2024-05-14 19:14:36.764	postgres | 2024-05-14 18:14:36.764 UTC [341] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5433"
2024-05-14 19:14:36.762	postgres | 2024-05-14 18:14:36.762 UTC [341] LOG:  listening on IPv6 address "::", port 5433
2024-05-14 19:14:36.762	postgres | 2024-05-14 18:14:36.762 UTC [341] LOG:  listening on IPv4 address "0.0.0.0", port 5433
2024-05-14 19:14:36.762	postgres | 2024-05-14 18:14:36.762 UTC [341] LOG:  starting PostgreSQL 15.6 (Debian 15.6-1.pgdg120+2) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
2024-05-14 19:14:36.710	proxy    | [NOTICE]   (357) : Loading success.
2024-05-14 19:14:36.710	proxy    | [NOTICE]   (357) : New worker (377) forked
2024-05-14 19:14:36.705	repmgrd  | restarting in 5s [attempt 1]
2024-05-14 19:14:36.705	repmgrd  | exit status 6
2024-05-14 19:14:36.704	repmgrd  |   user=repmgr connect_timeout=5 dbname=repmgr host=fdaa:5:8e54:a7b:18:96db:5d92:2 port=5433 fallback_application_name=repmgr options=-csearch_path=
2024-05-14 19:14:36.704	repmgrd  | [2024-05-14 18:14:36] [DETAIL] attempted to connect using:
2024-05-14 19:14:36.704	repmgrd  |
2024-05-14 19:14:36.704	repmgrd  | 	Is the server running on that host and accepting TCP/IP connections?
2024-05-14 19:14:36.704	repmgrd  | connection to server at "fdaa:5:8e54:a7b:18:96db:5d92:2", port 5433 failed: Connection refused
2024-05-14 19:14:36.704	repmgrd  | [2024-05-14 18:14:36] [DETAIL]
2024-05-14 19:14:36.704	repmgrd  | [2024-05-14 18:14:36] [ERROR] connection to database failed
2024-05-14 19:14:36.695	repmgrd  | [2024-05-14 18:14:36] [INFO] connecting to database "host=fdaa:5:8e54:a7b:18:96db:5d92:2 port=5433 user=repmgr dbname=repmgr connect_timeout=5"
2024-05-14 19:14:36.695	repmgrd  | [2024-05-14 18:14:36] [NOTICE] repmgrd (repmgrd 5.3.3) starting up
2024-05-14 19:14:36.626	monitor  | Pruning every 24h0m0s...
2024-05-14 19:14:36.618	monitor  | Monitoring replication slots
2024-05-14 19:14:36.618	monitor  | Monitoring cluster state
2024-05-14 19:14:36.618	monitor  | Monitoring dead members
2024-05-14 19:14:36.525	admin    | Running...
2024-05-14 19:14:36.525	exporter | Running...
2024-05-14 19:14:36.523	proxy    | Running...
2024-05-14 19:14:36.522	repmgrd  | Running...
2024-05-14 19:14:36.515	monitor  | Running...
2024-05-14 19:14:36.514	postgres | Running...
2024-05-14 19:14:36.353	2024/05/14 18:14:36 INFO SSH listening listen_address=[fdaa:5:8e54:a7b:18:96db:5d92:2]:22 dns_server=[fdaa::3]:53
2024-05-14 19:14:36.347	 INFO [fly api proxy] listening at /.fly/api
2024-05-14 19:14:36.340	 INFO Preparing to run: `docker-entrypoint.sh start` as root
2024-05-14 19:14:36.339	 INFO Resized /data to 10720641024 bytes
2024-05-14 19:14:36.336	 INFO Mounting /dev/vdb at /data w/ uid: 0, gid: 0 and chmod 0755
2024-05-14 19:14:36.266	 INFO Starting init (commit: d772ddd9)...
2024-05-14 19:14:36.076	[    0.164970] PCI: Fatal: No config space access function found
2024-05-14 19:14:36.026	[    0.156061] Spectre V2 : WARNING: Unprivileged eBPF is enabled with eIBRS on, data leaks possible via Spectre v2 BHB attacks!
2024-05-14 19:14:35.122	[34141.938099] reboot: Restarting system
2024-05-14 19:14:35.118	 INFO Umounting /dev/vdb from /data
2024-05-14 19:14:35.117	 INFO Starting clean up.
2024-05-14 19:14:35.103	 INFO Main child exited normally with code: 0
2024-05-14 19:14:35.101	postgres | Process exited 0
2024-05-14 19:14:35.100	postgres | 2024-05-14 18:14:35.100 UTC [342] LOG:  database system is shut down
2024-05-14 19:14:35.091	postgres | 2024-05-14 18:14:35.091 UTC [381] LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.002 s, sync=0.001 s, total=0.004 s; sync files=0, longest=0.000 s, average=0.000 s; distance=0 kB, estimate=5 kB
2024-05-14 19:14:35.088	repmgrd  | Process exited 0
2024-05-14 19:14:35.087	postgres | 2024-05-14 18:14:35.087 UTC [381] LOG:  checkpoint starting: shutdown immediate
2024-05-14 19:14:35.087	repmgrd  | [2024-05-14 18:14:35] [INFO] repmgrd terminating...
2024-05-14 19:14:35.087	repmgrd  | SELECT pg_catalog.pg_is_in_recovery()
2024-05-14 19:14:35.087	repmgrd  | [2024-05-14 18:14:35] [DETAIL] query text is:
2024-05-14 19:14:35.087	repmgrd  |
2024-05-14 19:14:35.087	repmgrd  | 	before or while processing the request.
2024-05-14 19:14:35.087	repmgrd  | 	This probably means the server terminated abnormally
2024-05-14 19:14:35.087	repmgrd  | server closed the connection unexpectedly
2024-05-14 19:14:35.087	repmgrd  | FATAL:  terminating connection due to administrator command
2024-05-14 19:14:35.087	repmgrd  | [2024-05-14 18:14:35] [DETAIL]
2024-05-14 19:14:35.087	repmgrd  | [2024-05-14 18:14:35] [ERROR] unable to determine if server is in recovery
2024-05-14 19:14:35.087	postgres | 2024-05-14 18:14:35.087 UTC [381] LOG:  shutting down
2024-05-14 19:14:35.087	postgres | 2024-05-14 18:14:35.085 UTC [438] FATAL:  terminating connection due to administrator command
2024-05-14 19:14:35.087	postgres | 2024-05-14 18:14:35.084 UTC [424] FATAL:  terminating connection due to administrator command
2024-05-14 19:14:35.087	postgres | 2024-05-14 18:14:35.083 UTC [342] LOG:  background worker "logical replication launcher" (PID 386) exited with exit code 1
2024-05-14 19:14:35.087	postgres | 2024-05-14 18:14:35.080 UTC [342] LOG:  aborting any active transactions
2024-05-14 19:14:35.087	proxy    | [WARNING]  (405) : All workers exited. Exiting... (130)
2024-05-14 19:14:35.087	proxy    | [ALERT]    (405) : Current worker (407) exited with code 130 (Interrupt)
2024-05-14 19:14:35.087	proxy    | exit status 130
2024-05-14 19:14:35.087	exporter | signal: interrupt
2024-05-14 19:14:35.087	admin    | signal: interrupt
2024-05-14 19:14:35.080	repmgrd  | [2024-05-14 18:14:35] [NOTICE] INT signal received
2024-05-14 19:14:35.080	postgres | 2024-05-14 18:14:35.078 UTC [342] LOG:  received fast shutdown request
2024-05-14 19:14:35.080	monitor  | signal: interrupt
2024-05-14 19:14:35.080	proxy    | [WARNING]  (405) : Exiting Master process...
2024-05-14 19:14:35.080	proxy    | [NOTICE]   (405) : path to executable is /usr/sbin/haproxy
2024-05-14 19:14:35.080	proxy    | [NOTICE]   (405) : haproxy version is 2.8.9-1~bpo12+1
2024-05-14 19:14:35.080	monitor  | Stopping interrupt...
2024-05-14 19:14:35.080	admin    | Stopping interrupt...
2024-05-14 19:14:35.080	repmgrd  | Stopping interrupt...
2024-05-14 19:14:35.080	exporter | Stopping interrupt...
2024-05-14 19:14:35.080	proxy    | Stopping interrupt...
2024-05-14 19:14:35.077	postgres | Stopping interrupt...
2024-05-14 19:14:35.077	Supervisor stopping
2024-05-14 19:14:35.077	Got interrupt, stopping
2024-05-14 19:14:35.077	 INFO Sending signal SIGINT to main child process w/ PID 323
2024-05-14 19:14:04.677	repmgrd  | [2024-05-14 18:14:04] [INFO] monitoring primary node "fdaa:5:8e54:a7b:18:96db:5d92:2" (ID: 374181770) in normal state
2024-05-14 19:10:33.906	monitor  | Voting member(s): 1, Active: 1, Inactive: 0, Conflicts: 0
2024-05-14 19:09:02.957	repmgrd  | [2024-05-14 18:09:02] [INFO] monitoring primary node "fdaa:5:8e54:a7b:18:96db:5d92:2" (ID: 374181770) in normal state
2024-05-14 19:05:33.905	monitor  | Voting member(s): 1, Active: 1, Inactive: 0, Conflicts: 0
2024-05-14 19:04:01.185	repmgrd  | [2024-05-14 18:04:01] [INFO] monitoring primary node "fdaa:5:8e54:a7b:18:96db:5d92:2" (ID: 374181770) in normal state
2024-05-14 19:00:33.905	monitor  | Voting member(s): 1, Active: 1, Inactive: 0, Conflicts: 0
2024-05-14 18:58:59.393	repmgrd  | [2024-05-14 17:58:59] [INFO] monitoring primary node "fdaa:5:8e54:a7b:18:96db:5d92:2" (ID: 374181770) in normal state
2024-05-14 18:55:33.904	monitor  | Voting member(s): 1, Active: 1, Inactive: 0, Conflicts: 0
2024-05-14 18:53:57.727	repmgrd  | [2024-05-14 17:53:57] [INFO] monitoring primary node "fdaa:5:8e54:a7b:18:96db:5d92:2" (ID: 374181770) in normal state
2024-05-14 18:50:33.903	monitor  | Voting member(s): 1, Active: 1, Inactive: 0, Conflicts: 0
2024-05-14 18:48:56.041	repmgrd  | [2024-05-14 17:48:56] [INFO] monitoring primary node "fdaa:5:8e54:a7b:18:96db:5d92:2" (ID: 374181770) in normal state

tinypellets · May 15, 2024, 7:30am

From General to Questions / Help

system · May 22, 2024, 7:31am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Unable to connect to my postgres instance Questions / Help postgres	6	596	February 22, 2023
postgres timeout errors Questions / Help postgres	0	225	November 17, 2022
Connection timeout when trying to connect to Postgres cluster from outside fly Questions / Help	5	1923	August 15, 2023
Postgres app unresponsive, timing out	3	279	June 13, 2023
Postgres machine blocked Questions / Help postgres	6	345	September 19, 2023

Issues with Postgres connection timeout

Related topics