Postgres cannot restart

I haven’t done anything at all, but postgres seems to have stopped working. I see this in the logs. What can I do to fix this?

 2024-09-11T17:52:56.369 app[73d8d390a7e789] hkg [info] [ 2.140771] reboot: Restarting system

2024-09-11T17:57:09.600 app[73d8d390a7e789] hkg [info] Starting init (commit: 81d5330)...

2024-09-11T17:57:09.621 app[73d8d390a7e789] hkg [info] Mounting /dev/vdb at /data w/ uid: 0, gid: 0 and chmod 0755

2024-09-11T17:57:09.627 app[73d8d390a7e789] hkg [info] Preparing to run: `docker-entrypoint.sh start` as root

2024-09-11T17:57:09.661 app[73d8d390a7e789] hkg [info] 2024/09/11 17:57:09 INFO SSH listening listen_address=[fdaa:0:d10e:a7b:7f07:5760:cff7:2]:22 dns_server=[fdaa::3]:53

2024-09-11T17:57:09.753 app[73d8d390a7e789] hkg [info] cluster spec filename /fly/cluster-spec.json

2024-09-11T17:57:09.755 app[73d8d390a7e789] hkg [info] panic: error loading cluster spec: unexpected end of JSON input

2024-09-11T17:57:09.755 app[73d8d390a7e789] hkg [info] goroutine 1 [running]:

2024-09-11T17:57:09.756 app[73d8d390a7e789] hkg [info] main.main()

2024-09-11T17:57:09.756 app[73d8d390a7e789] hkg [info] /go/src/github.com/fly-examples/postgres-ha/cmd/start/main.go:69 +0x1bbb

2024-09-11T17:57:09.864 runner[73d8d390a7e789] hkg [info] Machine started in 486ms

2024-09-11T17:57:10.638 app[73d8d390a7e789] hkg [info] Starting clean up.

2024-09-11T17:57:10.638 app[73d8d390a7e789] hkg [info] Umounting /dev/vdb from /data

2024-09-11T17:57:11.642 app[73d8d390a7e789] hkg [info] [ 2.141626] reboot: Restarting system

Odd… From the context, that same JSON parsing error persists across multiple restart attempts? How long ago was the database created? (It looks like the older, Stolon variety.)

$ fly m list     -a db-app-name  # will reveal number of nodes
$ fly image show -a db-app-name  # will say `postgres` instead of
                                 #  `postgres-flex` if Stolon
$ fly m status   -a db-app-name  # may mention a migration

The /fly/ directory is on the root partition instead of the persistent volume, if I understand correctly, so it might be a case of the Machine’s overlays or similar getting a little wedged. (Apparently, something along those same very general lines can happen with /etc/hosts.) It might be necessary to fork the volume and then attach that to a freshly cloned Machine.


Aside: Other ways of detecting Machine migrations…

https://community.fly.io/t/psa-machine-migration-has-started-again/20265/13

Yes, this database is very old. I created it two years ago, and I haven’t changed any database parameters since (but I use the database itself almost every week). I will try the fork idea and let you know how it goes.

When I try to fork the database:

Error: Failed to resolve the volume associated with the primary instance. See fly pg create --help for more information

When I try to manually start my machine from the browser dashboard:

Machine failed to start: Unknown response while starting machine

When I try to manually start my machine from terminal:

$ fly machines start xxx
Error: could not start machine xxx: failed to start VM xxx: failed_precondition: machine still active, refusing to start (Request ID: 01J82DZCDRBHQ1CVT0F2PQGJKJ-hkg)

Is there anyone who has any idea how to get this up and running again?

What is fly m status xxx saying at this point?

(fly m list, fly image show, and fly vol list would probably also help; we forum readers are rather in the dark, otherwise.)

Apologies. I have no idea what I’m doing. I just know it was working a few weeks ago, and now it’s stopped working. Please see below for the information.

PS D:\> fly m status 73d8d390a7e789
Machine ID: 73d8d390a7e789
Instance ID: 01GH5SP6BWQW6CN4EK7HW1WV7A
State: stopped
HostStatus: ok

VM
  ID            = 73d8d390a7e789
  Instance ID   = 01GH5SP6BWQW6CN4EK7HW1WV7A
  State         = stopped
  Image         = flyio/postgres:14.4 (v0.0.32)
  Name          = dawn-frost-6867
  Private IP    = fdaa:0:d10e:a7b:7f07:5760:cff7:2
  Region        = hkg
  Process Group =
  CPU Kind      = shared
  vCPUs         = 1
  Memory        = 256
  Created       = 2022-11-06T06:22:46Z
  Updated       = 2024-09-19T13:25:13Z
  Entrypoint    =
  Command       =
  Volume        = vol_18l524y36x1v7zmp

Checks [0/3]
NAME    STATUS  LAST UPDATED    OUTPUT
vm      warning 35s ago         the machine hasn't started
pg      warning 35s ago         the machine hasn't started
role    warning 35s ago         the machine hasn't started

Event Logs
STATE           EVENT   SOURCE  TIMESTAMP                       INFO
stopped         exit    flyd    2024-09-19T21:25:13.308+08:00   exit_code=2,oom_killed=false,requested_stop=false
started         start   flyd    2024-09-19T21:25:11.154+08:00
starting        start   flyd    2024-09-19T21:25:10.606+08:00
stopped         exit    flyd    2024-09-19T21:21:46.221+08:00   exit_code=2,oom_killed=false,requested_stop=false
started         start   flyd    2024-09-19T21:21:44.104+08:00
PS D:\> fly m list -a kit-api-db
1 machines have been retrieved from app kit-api-db.
View them in the UI here

kit-api-db
ID              NAME            STATE   CHECKS  REGION  ROLE                            IMAGE                          IP ADDRESS                       VOLUME                  CREATED                 LAST UPDATED            PROCESS GROUP  SIZE
73d8d390a7e789  dawn-frost-6867 stopped 0/3     hkg     the machine hasn't started      flyio/postgres:14.4 (v0.0.32)  fdaa:0:d10e:a7b:7f07:5760:cff7:2 vol_18l524y36x1v7zmp    2022-11-06T06:22:46Z    2024-09-19T13:25:13Z                   shared-cpu-1x:256MB
PS D:\> fly image show -a kit-api-db
Updates available:

Machine "73d8d390a7e789" flyio/postgres:14.4 (v0.0.32) -> flyio/postgres:14.6 (v0.0.41)

Run `flyctl image update` to migrate to the latest image version.
Image Details
MACHINE ID      REGISTRY                REPOSITORY      TAG     VERSION DIGEST
                        LABELS
73d8d390a7e789  registry-1.docker.io    flyio/postgres  14.4    v0.0.32 sha256:9daaa15119742e5777f5480ef476024e8827016718b5b020ef33a5fb084b60e8 fly.version=v0.0.32fly.app_role=postgres_clusterfly.pg-version=14.4-1.pgdg110+1

PS D:\> fly vol list -a kit-api-db
ID                      STATE   NAME    SIZE    REGION  ZONE    ENCRYPTED       ATTACHED VM     CREATED AT
vol_18l524y36x1v7zmp    created pg_data 1GB     hkg     c8cf    true            73d8d390a7e789  2 years ago
1 Like

Thanks for the additional details… These Fly Postgres instances require a lot of manual intervention sometimes, although the particular error you’re seeing is unusually opaque. It might help to try this lower-level way of forking now:

$ fly vol fork vol_18l524y36x1v7zmp -a kit-api-db
$ fly vol list -a kit-api-db  # note new volume ID
$ fly m stop  73d8d390a7e789 -a kit-api-db  # double-check
$ fly m clone 73d8d390a7e789 --attach-volume vol_xxx -a kit-api-db
$ fly m list -a kit-api-db  # verify that the new machine did pick
                            #  up the forked volume.

(Replace vol_xxx with the new volume ID.)

If that also fails, then the next thing to try would be restoring from a snapshot. (The snapshots are stored separately, as I understand it.)

Sorry you’ve been having so much trouble with this!

PS D:\> fly vol fork vol_18l524y36x1v7zmp -a kit-api-db
Error: failed to fork volume: failed to create volume: We need your payment information to continue! Add a credit card or buy credit: https://fly.io/dashboard/victor-lin/billing (Request ID: 01J87WEV21EXYX82KW2AQWYWK3-hkg)

Hmm, is there any way around this?

I don’t think so… They’ve been getting progressively stricter about requiring a credit card (or pre-purchased credits)—judging by recent forum posts…


If you have a lot of unnecessary machines or volumes around, then it might help to delete them, but that’s a bit of a longshot.

Yea unfortunately I only have one other machine, and it’s needed for that other project. If I switch to a managed database like Supabase, do you know if I can retrieve the data I have on my current suspended machine?

Ouch… I admittedly don’t know much about Supabase, but odds are that your current machine would need to be running in order to migrate, :thought_balloon:

It might be possible to override the existing machine’s startup sequence, so that you could at least SFTP the raw data files off of it.

(Caveat: the comment about fly pg import in that older post is unfortunately over-optimistic; too late to edit!)

It would still be necessary to feed that to a Linux Postgres v14.x instance running locally (on your desktop/laptop), :penguin:, which takes a lot of unfamiliar steps if you’ve never run PG that way before.

Alternatively, you could try contacting billing@fly.io and see if they might lift the restriction long enough for you to grab† your data. Any user can send mail to that address; it’s not restricted to the paid plans the way other support is. There may be a few days’ delay before a response arrives, though…

†If the PG server is running, then pg_dump is the best bet. (Or pg_dumpall, if you were doing fancy things with roles, multiple databases, etc.)

Added volumes

It seems like getting my data might take a little too much time and effort, so I will try and start anew with a database on Supabase. Thank you so much for all the help, mayailurus. You are an asset to humanity!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.