Scaling a postgres instance keeps on infinite restarts

Hi, my db lives in a single region and I wanted to add another machine in another region.

fly machine clone --region lax 11116e11001111 --app app-db

But the instance keeps on restarting

$ fly logs -a app-db -i 784eee0c212658

Waiting for logs...

2025-03-03T08:33:10.942 app[784eee0c212658] phx [info] 2025-03-03T08:33:10.942190267 [01JNDJ2VKDCQA8JBMTAH1FJFG2:main] Running Firecracker v1.7.0

2025-03-03T08:33:11.770 app[784eee0c212658] phx [info] INFO Starting init (commit: 67f51b8b)...

2025-03-03T08:33:11.875 app[784eee0c212658] phx [info] INFO Checking filesystem on /data

2025-03-03T08:33:11.877 app[784eee0c212658] phx [info] /dev/vdc: clean, 19/65280 files, 8850/261120 blocks

2025-03-03T08:33:11.878 app[784eee0c212658] phx [info] INFO Mounting /dev/vdc at /data w/ uid: 0, gid: 0 and chmod 0755

2025-03-03T08:33:11.880 app[784eee0c212658] phx [info] INFO Resized /data to 1069547520 bytes

2025-03-03T08:33:11.887 app[784eee0c212658] phx [info] INFO Preparing to run: `start` as root

2025-03-03T08:33:11.893 app[784eee0c212658] phx [info] INFO [fly api proxy] listening at /.fly/api

2025-03-03T08:33:11.922 runner[784eee0c212658] phx [info] Machine started in 1.048s

2025-03-03T08:33:12.145 app[784eee0c212658] phx [info] Provisioning standby

2025-03-03T08:33:12.179 app[784eee0c212658] phx [info] 2025/03/03 08:33:12 INFO SSH listening listen_address=[fdaa:0:631b:a7b:c9:2476:6dcf:2]:22

2025-03-03T08:33:13.160 app[784eee0c212658] phx [info] panic: failed to resolve member over dns: unable to resolve cloneable member

2025-03-03T08:33:13.160 app[784eee0c212658] phx [info] goroutine 1 [running]:

2025-03-03T08:33:13.160 app[784eee0c212658] phx [info] main.panicHandler({0x9bfe60?, 0xc00010eb10})

2025-03-03T08:33:13.160 app[784eee0c212658] phx [info] /go/src/github.com/fly-apps/fly-postgres/cmd/start/main.go:190 +0x55

2025-03-03T08:33:13.160 app[784eee0c212658] phx [info] main.main()

2025-03-03T08:33:13.160 app[784eee0c212658] phx [info] /go/src/github.com/fly-apps/fly-postgres/cmd/start/main.go:67 +0xef0

2025-03-03T08:33:13.894 app[784eee0c212658] phx [info] INFO Main child exited normally with code: 2

2025-03-03T08:33:13.909 app[784eee0c212658] phx [info] INFO Starting clean up.

2025-03-03T08:33:13.955 app[784eee0c212658] phx [info] INFO Umounting /dev/vdc from /data

2025-03-03T08:33:13.956 app[784eee0c212658] phx [info] WARN could not unmount /rootfs: EINVAL: Invalid argument

2025-03-03T08:33:13.957 app[784eee0c212658] phx [info] [ 2.954429] reboot: Restarting system

2025-03-03T08:34:32.724 app[784eee0c212658] phx [info] 2025-03-03T08:34:32.724105964 [01JNDJ2VKDCQA8JBMTAH1FJFG2:main] Running Firecracker v1.7.0

2025-03-03T08:34:33.542 app[784eee0c212658] phx [info] INFO Starting init (commit: 67f51b8b)...

2025-03-03T08:34:33.650 app[784eee0c212658] phx [info] INFO Checking filesystem on /data

2025-03-03T08:34:33.653 app[784eee0c212658] phx [info] /dev/vdc: clean, 19/65280 files, 8850/261120 blocks

2025-03-03T08:34:33.654 app[784eee0c212658] phx [info] INFO Mounting /dev/vdc at /data w/ uid: 0, gid: 0 and chmod 0755

2025-03-03T08:34:33.656 app[784eee0c212658] phx [info] INFO Resized /data to 1069547520 bytes

2025-03-03T08:34:33.666 app[784eee0c212658] phx [info] INFO Preparing to run: `start` as root

2025-03-03T08:34:33.674 app[784eee0c212658] phx [info] INFO [fly api proxy] listening at /.fly/api

2025-03-03T08:34:33.713 runner[784eee0c212658] phx [info] Machine started in 1.059s

2025-03-03T08:34:33.919 app[784eee0c212658] phx [info] Provisioning standby

2025-03-03T08:34:33.950 app[784eee0c212658] phx [info] 2025/03/03 08:34:33 INFO SSH listening listen_address=[fdaa:0:631b:a7b:c9:2476:6dcf:2]:22

2025-03-03T08:34:34.944 app[784eee0c212658] phx [info] panic: failed to resolve member over dns: unable to resolve cloneable member

Also I destroyed the faulty instance and deleted the volume. But when I look at the volumes in dashboard, I still can see Pending_destroy.

Hi… I’ve seen this before when the primary was unintentionally asleep. (And hence not registered in DNS.)

In general, distributed Postgres can’t be combined with auto-stop, :dragon:. The latter is really only intended for single-node development setups…

Is this written in docs? if not, seems weird not to inform users.

It’s not stated completely explicitly that I know of—and you’re right that that is weird. This is part of the broader disconnect between user expectations and implementation reality that is causing Fly Postgres to be sidelined now in favor of a fully managed alternative. (That’s my understanding from what Fly.io themselves have said in the past, anyway.)

Basically, the chain of inference is that auto-stop is only for development mode, and development mode is inherently a single Machine. (It’s the second statement that’s not fully nailed down in the docs.)

Distributed Postgres overall is considered an advanced topic, whereas the development configuration is a useful but less serious auxiliary…

Thanks @mayailurus!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.