Postgres SQL down on Fly.io

Evently · March 6, 2024, 11:31am

I have a critical issue with my Postgres SQL on Fly. I did nothing during the start of the issue (see logs below). The CPU skyrocketed around 10:40 (Swedish time) and our site is down since then.

Any help would be much appreciated, I’m not an expert in this area.

“500 Internal Server Error failed to connect to local node: failed to connect to host=****:*:****:a7b:e9:c4b1:****:* user=repmgr database=repmgr: server error (FATAL: the database system is not yet accepting connections (SQLSTATE 57P03))”


2024-03-06T11:08:05.099 app[**************] arn [info] postgres | 2024-03-06 11:08:05.097 UTC [19415] LOG: database system shutdown was interrupted; last known up at 2024-03-06 11:08:05 UTC
2024-03-06T11:08:05.164 app[**************] arn [info] postgres | 2024-03-06 11:08:05.164 UTC [19415] LOG: database system was not properly shut down; automatic recovery in progress
2024-03-06T11:08:05.165 app[**************] arn [info] postgres | 2024-03-06 11:08:05.165 UTC [19415] LOG: redo starts at 4/8B722E0
2024-03-06T11:08:05.167 app[**************] arn [info] postgres | 2024-03-06 11:08:05.166 UTC [19415] LOG: invalid record length at 4/8BBE288: wanted 24, got 0
2024-03-06T11:08:05.167 app[**************] arn [info] postgres | 2024-03-06 11:08:05.166 UTC [19415] LOG: redo done at 4/8BBE260 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2024-03-06T11:08:05.169 app[**************] arn [info] postgres | 2024-03-06 11:08:05.168 UTC [19416] LOG: checkpoint starting: end-of-recovery immediate wait
2024-03-06T11:08:05.172 app[**************] arn [info] [ 1231.255550] EXT4-fs (vdb): Delayed block allocation failed for inode 1649 at logical offset 8240 with max blocks 2 with error 117
2024-03-06T11:08:05.172 app[**************] arn [info] [ 1231.257338] EXT4-fs (vdb): This should not happen!! Data will be lost
2024-03-06T11:08:05.173 app[**************] arn [info] [ 1231.257338]
2024-03-06T11:08:05.173 app[**************] arn [info] postgres | 2024-03-06 11:08:05.173 UTC [19416] PANIC: could not flush dirty data: Structure needs cleaning
2024-03-06T11:08:05.174 app[**************] arn [info] postgres | 2024-03-06 11:08:05.173 UTC [328] LOG: checkpointer process (PID 19416) was terminated by signal 6: Aborted
2024-03-06T11:08:05.174 app[**************] arn [info] postgres | 2024-03-06 11:08:05.173 UTC [328] LOG: terminating any other active server processes
2024-03-06T11:08:05.177 app[**************] arn [info] postgres | 2024-03-06 11:08:05.174 UTC [328] LOG: all server processes terminated; reinitializing
2024-03-06T11:08:05.187 app[**************] arn [info] postgres | 2024-03-06 11:08:05.185 UTC [19418] LOG: database system shutdown was interrupted; last known up at 2024-03-06 11:08:05 UTC
2024-03-06T11:08:05.248 app[**************] arn [info] postgres | 2024-03-06 11:08:05.248 UTC [19418] LOG: database system was not properly shut down; automatic recovery in progress
2024-03-06T11:08:05.249 app[**************] arn [info] postgres | 2024-03-06 11:08:05.249 UTC [19418] LOG: redo starts at 4/8B722E0
2024-03-06T11:08:05.251 app[**************] arn [info] postgres | 2024-03-06 11:08:05.250 UTC [19418] LOG: invalid record length at 4/8BBE288: wanted 24, got 0
2024-03-06T11:08:05.251 app[**************] arn [info] postgres | 2024-03-06 11:08:05.250 UTC [19418] LOG: redo done at 4/8BBE260 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2024-03-06T11:08:05.253 app[**************] arn [info] postgres | 2024-03-06 11:08:05.252 UTC [19419] LOG: checkpoint starting: end-of-recovery immediate wait
2024-03-06T11:08:05.255 app[**************] arn [info] [ 1231.339454] EXT4-fs (vdb): Delayed block allocation failed for inode 1649 at logical offset 8240 with max blocks 2 with error 117
2024-03-06T11:08:05.256 app[**************] arn [info] [ 1231.340933] EXT4-fs (vdb): This should not happen!! Data will be lost
2024-03-06T11:08:05.256 app[**************] arn [info] [ 1231.340933]
2024-03-06T11:08:05.256 app[**************] arn [info] postgres | 2024-03-06 11:08:05.256 UTC [19419] PANIC: could not flush dirty data: Structure needs cleaning
2024-03-06T11:08:05.258 app[**************] arn [info] postgres | 2024-03-06 11:08:05.257 UTC [328] LOG: checkpointer process (PID 19419) was terminated by signal 6: Aborted
2024-03-06T11:08:05.258 app[**************] arn [info] postgres | 2024-03-06 11:08:05.257 UTC [328] LOG: terminating any other active server processes
2024-03-06T11:08:05.260 app[**************] arn [info] postgres | 2024-03-06 11:08:05.258 UTC [328] LOG: all server processes terminated; reinitializing
2024-03-06T11:08:05.271 app[**************] arn [info] postgres | 2024-03-06 11:08:05.268 UTC [19421] LOG: database system shutdown was interrupted; last known up at 2024-03-06 11:08:05 UTC
2024-03-06T11:08:05.331 app[**************] arn [info] postgres | 2024-03-06 11:08:05.330 UTC [19421] LOG: database system was not properly shut down; automatic recovery in progress
2024-03-06T11:08:05.332 app[**************] arn [info] postgres | 2024-03-06 11:08:05.332 UTC [19421] LOG: redo starts at 4/8B722E0
2024-03-06T11:08:05.334 app[**************] arn [info] postgres | 2024-03-06 11:08:05.333 UTC [19421] LOG: invalid record length at 4/8BBE288: wanted 24, got 0
2024-03-06T11:08:05.334 app[**************] arn [info] postgres | 2024-03-06 11:08:05.334 UTC [19421] LOG: redo done at 4/8BBE260 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2024-03-06T11:08:05.336 app[**************] arn [info] postgres | 2024-03-06 11:08:05.336 UTC [19422] LOG: checkpoint starting: end-of-recovery immediate wait
2024-03-06T11:08:05.339 app[**************] arn [info] [ 1231.423582] EXT4-fs (vdb): Delayed block allocation failed for inode 1649 at logical offset 8240 with max blocks 2 with error 117
2024-03-06T11:08:05.340 app[**************] arn [info] [ 1231.425228] EXT4-fs (vdb): This should not happen!! Data will be lost
2024-03-06T11:08:05.341 app[**************] arn [info] [ 1231.425228]
2024-03-06T11:08:05.341 app[**************] arn [info] postgres | 2024-03-06 11:08:05.341 UTC [19422] PANIC: could not flush dirty data: Structure needs cleaning
2024-03-06T11:08:05.342 app[**************] arn [info] postgres | 2024-03-06 11:08:05.341 UTC [328] LOG: checkpointer process (PID 19422) was terminated by signal 6: Aborted
2024-03-06T11:08:05.342 app[**************] arn [info] postgres | 2024-03-06 11:08:05.341 UTC [328] LOG: terminating any other active server processes
2024-03-06T11:08:05.345 app[**************] arn [info] postgres | 2024-03-06 11:08:05.342 UTC [328] LOG: all server processes terminated; reinitializing
2024-03-06T11:08:05.355 app[**************] arn [info] postgres | 2024-03-06 11:08:05.352 UTC [19424] LOG: database system shutdown was interrupted; last known up at 2024-03-06 11:08:05 UTC
2024-03-06T11:08:05.414 app[**************] arn [info] postgres | 2024-03-06 11:08:05.413 UTC [19424] LOG: database system was not properly shut down; automatic recovery in progress
2024-03-06T11:08:05.415 app[**************] arn [info] postgres | 2024-03-06 11:08:05.415 UTC [19424] LOG: redo starts at 4/8B722E0
2024-03-06T11:08:05.416 app[**************] arn [info] postgres | 2024-03-06 11:08:05.416 UTC [19424] LOG: invalid record length at 4/8BBE288: wanted 24, got 0
2024-03-06T11:08:05.416 app[**************] arn [info] postgres | 2024-03-06 11:08:05.416 UTC [19424] LOG: redo done at 4/8BBE260 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2024-03-06T11:08:05.418 app[**************] arn [info] postgres | 2024-03-06 11:08:05.418 UTC [19425] LOG: checkpoint starting: end-of-recovery immediate wait
2024-03-06T11:08:05.421 app[**************] arn [info] [ 1231.505370] EXT4-fs (vdb): Delayed block allocation failed for inode 1649 at logical offset 8240 with max blocks 2 with error 117

2024-03-06T11:08:05.422 app[**************] arn [info] [ 1231.506881] EXT4-fs (vdb): This should not happen!! Data will be lost

Evently · March 6, 2024, 2:09pm

UPDATE:
We “fixed” our problem by creating a new machine and volume using a snapshot and attached the web application to the new machine.

We still don’t know why the issue appeared and lost some data in the snapshot restore.

mayailurus · March 6, 2024, 4:15pm

Added postgres

system · March 13, 2024, 4:15pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Postgres troubles. My app stopped :( Questions / Help postgres	1	129	June 14, 2024
Postgres troubles postgres	13	460	June 18, 2024
My application and postgres are down postgres	1	257	December 30, 2022
Connection Issues on Fly Postgres Region Singapore Questions / Help elixir , postgres	4	374	January 12, 2024
Postgres machines down? Questions / Help postgres	4	526	April 12, 2024

Postgres SQL down on Fly.io

Related topics