Postgres SQL down on Fly.io

I have a critical issue with my Postgres SQL on Fly. I did nothing during the start of the issue (see logs below). The CPU skyrocketed around 10:40 (Swedish time) and our site is down since then.

Any help would be much appreciated, I’m not an expert in this area.

“500 Internal Server Error failed to connect to local node: failed to connect to host=****:*:****:a7b:e9:c4b1:****:* user=repmgr database=repmgr: server error (FATAL: the database system is not yet accepting connections (SQLSTATE 57P03))”


2024-03-06T11:08:05.099 app[**************] arn [info] postgres | 2024-03-06 11:08:05.097 UTC [19415] LOG: database system shutdown was interrupted; last known up at 2024-03-06 11:08:05 UTC

2024-03-06T11:08:05.164 app[**************] arn [info] postgres | 2024-03-06 11:08:05.164 UTC [19415] LOG: database system was not properly shut down; automatic recovery in progress

2024-03-06T11:08:05.165 app[**************] arn [info] postgres | 2024-03-06 11:08:05.165 UTC [19415] LOG: redo starts at 4/8B722E0

2024-03-06T11:08:05.167 app[**************] arn [info] postgres | 2024-03-06 11:08:05.166 UTC [19415] LOG: invalid record length at 4/8BBE288: wanted 24, got 0

2024-03-06T11:08:05.167 app[**************] arn [info] postgres | 2024-03-06 11:08:05.166 UTC [19415] LOG: redo done at 4/8BBE260 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s

2024-03-06T11:08:05.169 app[**************] arn [info] postgres | 2024-03-06 11:08:05.168 UTC [19416] LOG: checkpoint starting: end-of-recovery immediate wait

2024-03-06T11:08:05.172 app[**************] arn [info] [ 1231.255550] EXT4-fs (vdb): Delayed block allocation failed for inode 1649 at logical offset 8240 with max blocks 2 with error 117

2024-03-06T11:08:05.172 app[**************] arn [info] [ 1231.257338] EXT4-fs (vdb): This should not happen!! Data will be lost

2024-03-06T11:08:05.173 app[**************] arn [info] [ 1231.257338]

2024-03-06T11:08:05.173 app[**************] arn [info] postgres | 2024-03-06 11:08:05.173 UTC [19416] PANIC: could not flush dirty data: Structure needs cleaning

2024-03-06T11:08:05.174 app[**************] arn [info] postgres | 2024-03-06 11:08:05.173 UTC [328] LOG: checkpointer process (PID 19416) was terminated by signal 6: Aborted

2024-03-06T11:08:05.174 app[**************] arn [info] postgres | 2024-03-06 11:08:05.173 UTC [328] LOG: terminating any other active server processes

2024-03-06T11:08:05.177 app[**************] arn [info] postgres | 2024-03-06 11:08:05.174 UTC [328] LOG: all server processes terminated; reinitializing

2024-03-06T11:08:05.187 app[**************] arn [info] postgres | 2024-03-06 11:08:05.185 UTC [19418] LOG: database system shutdown was interrupted; last known up at 2024-03-06 11:08:05 UTC

2024-03-06T11:08:05.248 app[**************] arn [info] postgres | 2024-03-06 11:08:05.248 UTC [19418] LOG: database system was not properly shut down; automatic recovery in progress

2024-03-06T11:08:05.249 app[**************] arn [info] postgres | 2024-03-06 11:08:05.249 UTC [19418] LOG: redo starts at 4/8B722E0

2024-03-06T11:08:05.251 app[**************] arn [info] postgres | 2024-03-06 11:08:05.250 UTC [19418] LOG: invalid record length at 4/8BBE288: wanted 24, got 0

2024-03-06T11:08:05.251 app[**************] arn [info] postgres | 2024-03-06 11:08:05.250 UTC [19418] LOG: redo done at 4/8BBE260 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s

2024-03-06T11:08:05.253 app[**************] arn [info] postgres | 2024-03-06 11:08:05.252 UTC [19419] LOG: checkpoint starting: end-of-recovery immediate wait

2024-03-06T11:08:05.255 app[**************] arn [info] [ 1231.339454] EXT4-fs (vdb): Delayed block allocation failed for inode 1649 at logical offset 8240 with max blocks 2 with error 117

2024-03-06T11:08:05.256 app[**************] arn [info] [ 1231.340933] EXT4-fs (vdb): This should not happen!! Data will be lost

2024-03-06T11:08:05.256 app[**************] arn [info] [ 1231.340933]

2024-03-06T11:08:05.256 app[**************] arn [info] postgres | 2024-03-06 11:08:05.256 UTC [19419] PANIC: could not flush dirty data: Structure needs cleaning

2024-03-06T11:08:05.258 app[**************] arn [info] postgres | 2024-03-06 11:08:05.257 UTC [328] LOG: checkpointer process (PID 19419) was terminated by signal 6: Aborted

2024-03-06T11:08:05.258 app[**************] arn [info] postgres | 2024-03-06 11:08:05.257 UTC [328] LOG: terminating any other active server processes

2024-03-06T11:08:05.260 app[**************] arn [info] postgres | 2024-03-06 11:08:05.258 UTC [328] LOG: all server processes terminated; reinitializing

2024-03-06T11:08:05.271 app[**************] arn [info] postgres | 2024-03-06 11:08:05.268 UTC [19421] LOG: database system shutdown was interrupted; last known up at 2024-03-06 11:08:05 UTC

2024-03-06T11:08:05.331 app[**************] arn [info] postgres | 2024-03-06 11:08:05.330 UTC [19421] LOG: database system was not properly shut down; automatic recovery in progress

2024-03-06T11:08:05.332 app[**************] arn [info] postgres | 2024-03-06 11:08:05.332 UTC [19421] LOG: redo starts at 4/8B722E0

2024-03-06T11:08:05.334 app[**************] arn [info] postgres | 2024-03-06 11:08:05.333 UTC [19421] LOG: invalid record length at 4/8BBE288: wanted 24, got 0

2024-03-06T11:08:05.334 app[**************] arn [info] postgres | 2024-03-06 11:08:05.334 UTC [19421] LOG: redo done at 4/8BBE260 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s

2024-03-06T11:08:05.336 app[**************] arn [info] postgres | 2024-03-06 11:08:05.336 UTC [19422] LOG: checkpoint starting: end-of-recovery immediate wait

2024-03-06T11:08:05.339 app[**************] arn [info] [ 1231.423582] EXT4-fs (vdb): Delayed block allocation failed for inode 1649 at logical offset 8240 with max blocks 2 with error 117

2024-03-06T11:08:05.340 app[**************] arn [info] [ 1231.425228] EXT4-fs (vdb): This should not happen!! Data will be lost

2024-03-06T11:08:05.341 app[**************] arn [info] [ 1231.425228]

2024-03-06T11:08:05.341 app[**************] arn [info] postgres | 2024-03-06 11:08:05.341 UTC [19422] PANIC: could not flush dirty data: Structure needs cleaning

2024-03-06T11:08:05.342 app[**************] arn [info] postgres | 2024-03-06 11:08:05.341 UTC [328] LOG: checkpointer process (PID 19422) was terminated by signal 6: Aborted

2024-03-06T11:08:05.342 app[**************] arn [info] postgres | 2024-03-06 11:08:05.341 UTC [328] LOG: terminating any other active server processes

2024-03-06T11:08:05.345 app[**************] arn [info] postgres | 2024-03-06 11:08:05.342 UTC [328] LOG: all server processes terminated; reinitializing

2024-03-06T11:08:05.355 app[**************] arn [info] postgres | 2024-03-06 11:08:05.352 UTC [19424] LOG: database system shutdown was interrupted; last known up at 2024-03-06 11:08:05 UTC

2024-03-06T11:08:05.414 app[**************] arn [info] postgres | 2024-03-06 11:08:05.413 UTC [19424] LOG: database system was not properly shut down; automatic recovery in progress

2024-03-06T11:08:05.415 app[**************] arn [info] postgres | 2024-03-06 11:08:05.415 UTC [19424] LOG: redo starts at 4/8B722E0

2024-03-06T11:08:05.416 app[**************] arn [info] postgres | 2024-03-06 11:08:05.416 UTC [19424] LOG: invalid record length at 4/8BBE288: wanted 24, got 0

2024-03-06T11:08:05.416 app[**************] arn [info] postgres | 2024-03-06 11:08:05.416 UTC [19424] LOG: redo done at 4/8BBE260 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s

2024-03-06T11:08:05.418 app[**************] arn [info] postgres | 2024-03-06 11:08:05.418 UTC [19425] LOG: checkpoint starting: end-of-recovery immediate wait

2024-03-06T11:08:05.421 app[**************] arn [info] [ 1231.505370] EXT4-fs (vdb): Delayed block allocation failed for inode 1649 at logical offset 8240 with max blocks 2 with error 117

2024-03-06T11:08:05.422 app[**************] arn [info] [ 1231.506881] EXT4-fs (vdb): This should not happen!! Data will be lost

UPDATE:
We “fixed” our problem by creating a new machine and volume using a snapshot and attached the web application to the new machine.

We still don’t know why the issue appeared and lost some data in the snapshot restore.

Added postgres

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.