PG Machine is configuration error and 502 HTML error code thrown

Hi All,

I started to have strange behavior on one of the pages. Just 1 page could not be loaded out of whole project and I got 502 HTML error thrown. Then I checked the logs on VM and found “resource limit reached error” and I tried to scale up memory for that machine (flyctl scale memory) … then everything went sideways and my VM going in infinite loop with error massages:

2025-11-20T06:06:15.523 app[78432debe79658] sin [info] proxy | exit status 1

2025-11-20T06:06:15.523 app[78432debe79658] sin [info] reader error: read ptm: input/output error

2025-11-20T06:06:15.524 app[78432debe79658] sin [info] proxy | restarting in 1s [attempt 283]

2025-11-20T06:06:16.063 app[6830311bd62e18] sin [info] monitor | Voting member(s): 2, Active: 2, Inactive: 0, Conflicts: 0

2025-11-20T06:06:16.201 app[6830311bd62e18] sin [info] proxy | Running...

2025-11-20T06:06:16.254 app[6830311bd62e18] sin [info] proxy | [NOTICE] (1949) : haproxy version is 2.8.5-1ubuntu3.4

2025-11-20T06:06:16.254 app[6830311bd62e18] sin [info] proxy | [NOTICE] (1949) : path to executable is /usr/sbin/haproxy

2025-11-20T06:06:16.254 app[6830311bd62e18] sin [info] proxy | [ALERT] (1949) : config : parsing [/fly/haproxy.cfg:38] : backend 'bk_db', another server named 'pg1' was already defined at line 37, please use distinct names.

2025-11-20T06:06:16.254 app[6830311bd62e18] sin [info] proxy | [ALERT] (1949) : config : parsing [/fly/haproxy.cfg:38] : backend 'bk_db', another server named 'pg2' was already defined at line 37, please use distinct names.

2025-11-20T06:06:16.254 app[6830311bd62e18] sin [info] proxy | [ALERT] (1949) : config : parsing [/fly/haproxy.cfg:38] : backend 'bk_db', another server named 'pg3' was already defined at line 37, please use distinct names.

2025-11-20T06:06:16.254 app[6830311bd62e18] sin [info] proxy | [ALERT] (1949) : config : parsing [/fly/haproxy.cfg:38] : backend 'bk_db', another server named 'pg4' was already defined at line 37, please use distinct names.

2025-11-20T06:06:16.254 app[6830311bd62e18] sin [info] proxy | [ALERT] (1949) : config : parsing [/fly/haproxy.cfg:38] : backend 'bk_db', another server named 'pg5' was already defined at line 37, please use distinct names.

2025-11-20T06:06:16.254 app[6830311bd62e18] sin [info] proxy | [ALERT] (1949) : config : parsing [/fly/haproxy.cfg:38] : backend 'bk_db', another server named 'pg6' was already defined at line 37, please use distinct names.

2025-11-20T06:06:16.254 app[6830311bd62e18] sin [info] proxy | [ALERT] (1949) : config : parsing [/fly/haproxy.cfg:38] : backend 'bk_db', another server named 'pg7' was already defined at line 37, please use distinct names.

2025-11-20T06:06:16.254 app[6830311bd62e18] sin [info] proxy | [ALERT] (1949) : config : parsing [/fly/haproxy.cfg:38] : backend 'bk_db', another server named 'pg8' was already defined at line 37, please use distinct names.

2025-11-20T06:06:16.254 app[6830311bd62e18] sin [info] proxy | [ALERT] (1949) : config : parsing [/fly/haproxy.cfg:38] : backend 'bk_db', another server named 'pg9' was already defined at line 37, please use distinct names.

2025-11-20T06:06:16.254 app[6830311bd62e18] sin [info] proxy | [ALERT] (1949) : config : parsing [/fly/haproxy.cfg:38] : backend 'bk_db', another server named 'pg10' was already defined at line 37, please use distinct names.

2025-11-20T06:06:16.254 app[6830311bd62e18] sin [info] proxy | [ALERT] (1949) : config : Fatal errors found in configuration.

2025-11-20T06:06:16.256 app[6830311bd62e18] sin [info] repmgrd | [2025-11-20 06:06:16] [INFO] monitoring primary node "6830311bd62e18" (ID: 377450399) in normal state

2025-11-20T06:06:16.258 app[6830311bd62e18] sin [info] proxy | exit status 1

All PG machines are up and running, all health check passed but I can’t reach any page and 502 thrown. Can anyone suggest the direction where to search please?

Hi again… Managed Postgres is available in Singapore now, so that would be my main recommendation. All these scaling, poking around, and snapshot recovery headaches will go away!

There’s a separate procedure for scaling Legacy Postgres, so I’m not entirely surprised that the above caused problems.

https://fly.io/docs/postgres/managing/scaling/

Legacy Postgres was really only intended for people who were expert database administrators already—and just wanted to save a little time on typing, etc. The puzzles and emergencies will only get worse as the months wear on and all this gets more and more deprecated…

I understand… And need to explore this possibility further. Unfortunately I have another project running and had to time to do a proper maintenance for this one.

What can be done to restore the project? Now it is all down as server overwhelmed with those errors and can’t accept anymore requests

The volume-forking trick (with explicit volume ID) from earlier is inelegant but is usually enough to shake people free of this:

https://community.fly.io/t/urgency-problems-with-postgres-the-database-is-not-responding/19926/2

You may need an explicit --image-ref, if your PG Flex version is on the older side.

I managed to create new PG cluster using volume image. After altering the DATABASE_URL of old app to introduce new cluster I could get all the data back.

My initial problem though remails …. one page does not want to load! I don’t see anything in machine logs for neither PG nor app

Logs from my app

2025-11-24T20:31:41.363 app[1781345a465e58] arn [info] [ 440.533086] reboot: Restarting system

2025-11-24T20:34:25.773 app[1781034b441038] fra [info] [2025-11-24 20:34:25 +0000] [650] [CRITICAL] WORKER TIMEOUT (pid:668)

2025-11-24T20:34:25.774 app[1781034b441038] fra [info] [2025-11-24 20:34:25 +0000] [668] [INFO] Worker exiting (pid: 668)

2025-11-24T20:34:25.774 proxy[1781034b441038] fra [error] [PU02] could not complete HTTP request to instance: connection closed before message completed

2025-11-24T20:34:25.951 app[1781034b441038] fra [info] [2025-11-24 20:34:25 +0000] [669] [INFO] Booting worker with pid: 669

Logs from PG

2025-11-24T20:17:38.239 app[287454ea4e0418] sin [info] repmgrd | [2025-11-24 20:17:38] [INFO] monitoring primary node "287454ea4e0418" (ID: 377450399) in normal state

2025-11-24T20:22:31.191 app[287454ea4e0418] sin [info] monitor | Voting member(s): 3, Active: 3, Inactive: 0, Conflicts: 0

2025-11-24T20:22:40.189 app[287454ea4e0418] sin [info] repmgrd | [2025-11-24 20:22:40] [INFO] monitoring primary node "287454ea4e0418" (ID: 377450399) in normal state

2025-11-24T20:27:26.879 app[287454ea4e0418] sin [info] postgres | 2025-11-24 20:27:26.878 UTC [708] LOG: checkpoint starting: time

2025-11-24T20:27:27.384 app[287454ea4e0418] sin [info] postgres | 2025-11-24 20:27:27.384 UTC [708] LOG: checkpoint complete: wrote 6 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.503 s, sync=0.001 s, total=0.506 s; sync files=6, longest=0.001 s, average=0.001 s; distance=13 kB, estimate=13 kB; lsn=0/13259240, redo lsn=0/13259208

2025-11-24T20:27:31.142 app[287454ea4e0418] sin [info] monitor | Voting member(s): 3, Active: 3, Inactive: 0, Conflicts: 0

2025-11-24T20:27:42.190 app[287454ea4e0418] sin [info] repmgrd | [2025-11-24 20:27:42] [INFO] monitoring primary node "287454ea4e0418" (ID: 377450399) in normal state

2025-11-24T20:32:31.125 app[287454ea4e0418] sin [info] monitor | Voting member(s): 3, Active: 3, Inactive: 0, Conflicts: 0

2025-11-24T20:32:44.201 app[287454ea4e0418] sin [info] repmgrd | [2025-11-24 20:32:44] [INFO] monitoring primary node "287454ea4e0418" (ID: 377450399) in normal state

2025-11-24T20:37:31.185 app[287454ea4e0418] sin [info] monitor | Voting member(s): 3, Active: 3, Inactive: 0, Conflicts: 0

2025-11-24T20:37:44.318 app[287454ea4e0418] sin [info] repmgrd | [2025-11-24 20:37:44] [INFO] monitoring primary node "287454ea4e0418" (ID: 377450399) in normal state

HTTP ERROR 502 Any suggestions please?

Hm… You do have an error in the web-app logs there…

This Machine is in Germany, but the database is way over in Singapore.

Long-distance Postgres is something that is really best avoided, when you can. What is the full list of regions that your web app has Machines in?


Also, you can try increasing the timeout on that worker. Possibly you’re doing something computationally intensive that just won’t fit into the framework’s default time slice.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.