Postgres Replica - could not unmount /rootfs: EINVAL: Invalid argument

When trying to create my third postgresql instance (second replica) as recommended it fails to start. Looking into the logs could not unmount /rootfs: EINVAL: Invalid argument seems to be the issue.

Seems like a backend issue? Please advise.

This is still true today. When a machine is being stopped, something is being noisy on the way down. Ignore that. Somewhere else in your logs is the cause of the machine going down.

Gotcha, well I guess real issue is then I can’t get a Replica to start. In the web interface I get Machine failed to start: Unknown response while starting machine and the only other error I see is exit_code=2,oom_killed=false,requested_stop=false

Can anyone look into this?

wouldn’t it be easier and quicker to just delete this replica and create a new one?

I tried that a few times already with same results.

that’s very… uncommon. Might help to get the full story of how you created the cluster in the first place (what options to chose to begin with and then what you did to try and add another replica).

Also full(er) logs rather than single sentences extracts would be helpful

1 Like

Do you have the ability to view accounts?

I created my primary database and a replica initially and everything was fine. As I moved my app into more of a beta state I tried creating the second replica as recommended in the docs.

That second replica has never been able to start for some reason. Here are the full logs from a failed replica start

2025-01-30T15:11:14Z app[d8dd9e3f214398] ord [info]panic: failed to resolve member over dns: unable to resolve cloneable member
2025-01-30T15:11:14Z app[d8dd9e3f214398] ord [info]goroutine 1 [running]:
2025-01-30T15:11:14Z app[d8dd9e3f214398] ord [info]main.panicHandler({0xa0c0a0, 0xc0003b4ac0})
2025-01-30T15:11:14Z app[d8dd9e3f214398] ord [info]	/go/src/github.com/fly-apps/fly-postgres/cmd/start/main.go:190 +0x4a
2025-01-30T15:11:14Z app[d8dd9e3f214398] ord [info]main.main()
2025-01-30T15:11:14Z app[d8dd9e3f214398] ord [info]	/go/src/github.com/fly-apps/fly-postgres/cmd/start/main.go:67 +0xe65
2025-01-30T15:11:14Z app[d8dd9e3f214398] ord [info]2025/01/30 15:11:14 INFO SSH listening listen_address=[fdaa:a:ba15:a7b:2aa:5946:a3ac:2]:22 dns_server=[fdaa::3]:53
2025-01-30T15:11:15Z app[d8dd9e3f214398] ord [info] INFO Main child exited normally with code: 2
2025-01-30T15:11:15Z app[d8dd9e3f214398] ord [info] INFO Starting clean up.
2025-01-30T15:11:15Z app[d8dd9e3f214398] ord [info] INFO Umounting /dev/vdc from /data
2025-01-30T15:11:15Z app[d8dd9e3f214398] ord [info] WARN could not unmount /rootfs: EINVAL: Invalid argument
2025-01-30T15:11:15Z app[d8dd9e3f214398] ord [info][    1.974333] reboot: Restarting system
2025-01-30T15:11:30Z app[d8dd9e3f214398] ord [info]2025-01-30T15:11:30.591568479 [01JJVW6RNK8RQ9AM0DMN33QGE3:main] Running Firecracker v1.7.0
2025-01-30T15:11:31Z app[d8dd9e3f214398] ord [info] INFO Starting init (commit: 676c82a4)...
2025-01-30T15:11:31Z app[d8dd9e3f214398] ord [info] INFO Checking filesystem on /data
2025-01-30T15:11:31Z app[d8dd9e3f214398] ord [info]/dev/vdc: clean, 19/64512 files, 8792/258048 blocks
2025-01-30T15:11:31Z app[d8dd9e3f214398] ord [info] INFO Mounting /dev/vdc at /data w/ uid: 0, gid: 0 and chmod 0755
2025-01-30T15:11:31Z app[d8dd9e3f214398] ord [info] INFO Resized /data to 1056964608 bytes
2025-01-30T15:11:31Z app[d8dd9e3f214398] ord [info] INFO Preparing to run: `start` as root
2025-01-30T15:11:31Z app[d8dd9e3f214398] ord [info] INFO [fly api proxy] listening at /.fly/api
2025-01-30T15:11:31Z runner[d8dd9e3f214398] ord [info]Machine started in 1.104s
2025-01-30T15:11:31Z app[d8dd9e3f214398] ord [info]Provisioning standby
2025-01-30T15:11:31Z app[d8dd9e3f214398] ord [info]panic: failed to resolve member over dns: unable to resolve cloneable member
2025-01-30T15:11:31Z app[d8dd9e3f214398] ord [info]goroutine 1 [running]:
2025-01-30T15:11:31Z app[d8dd9e3f214398] ord [info]main.panicHandler({0xa0c0a0, 0xc0003b4ad0})
2025-01-30T15:11:31Z app[d8dd9e3f214398] ord [info]	/go/src/github.com/fly-apps/fly-postgres/cmd/start/main.go:190 +0x4a
2025-01-30T15:11:31Z app[d8dd9e3f214398] ord [info]main.main()
2025-01-30T15:11:31Z app[d8dd9e3f214398] ord [info]	/go/src/github.com/fly-apps/fly-postgres/cmd/start/main.go:67 +0xe65
2025-01-30T15:11:31Z app[d8dd9e3f214398] ord [info]2025/01/30 15:11:31 INFO SSH listening listen_address=[fdaa:a:ba15:a7b:2aa:5946:a3ac:2]:22 dns_server=[fdaa::3]:53
2025-01-30T15:11:32Z app[d8dd9e3f214398] ord [info] INFO Main child exited normally with code: 2
2025-01-30T15:11:32Z app[d8dd9e3f214398] ord [info] INFO Starting clean up.
2025-01-30T15:11:32Z app[d8dd9e3f214398] ord [info] INFO Umounting /dev/vdc from /data
2025-01-30T15:11:32Z app[d8dd9e3f214398] ord [info] WARN could not unmount /rootfs: EINVAL: Invalid argument
2025-01-30T15:11:32Z app[d8dd9e3f214398] ord [info][    1.963753] reboot: Restarting system
2025-01-30T15:11:46Z app[d8dd9e3f214398] ord [info]2025-01-30T15:11:46.354127882 [01JJVW6RNK8RQ9AM0DMN33QGE3:main] Running Firecracker v1.7.0
2025-01-30T15:11:47Z app[d8dd9e3f214398] ord [info] INFO Starting init (commit: 676c82a4)...
2025-01-30T15:11:47Z app[d8dd9e3f214398] ord [info] INFO Checking filesystem on /data
2025-01-30T15:11:47Z app[d8dd9e3f214398] ord [info]/dev/vdc: clean, 19/64512 files, 8792/258048 blocks
2025-01-30T15:11:47Z app[d8dd9e3f214398] ord [info] INFO Mounting /dev/vdc at /data w/ uid: 0, gid: 0 and chmod 0755
2025-01-30T15:11:47Z app[d8dd9e3f214398] ord [info] INFO Resized /data to 1056964608 bytes
2025-01-30T15:11:47Z app[d8dd9e3f214398] ord [info] INFO Preparing to run: `start` as root
2025-01-30T15:11:47Z app[d8dd9e3f214398] ord [info] INFO [fly api proxy] listening at /.fly/api
2025-01-30T15:11:47Z runner[d8dd9e3f214398] ord [info]Machine started in 1.213s
2025-01-30T15:11:47Z app[d8dd9e3f214398] ord [info]Provisioning standby
2025-01-30T15:11:47Z app[d8dd9e3f214398] ord [info]panic: failed to resolve member over dns: unable to resolve cloneable member
2025-01-30T15:11:47Z app[d8dd9e3f214398] ord [info]goroutine 1 [running]:
2025-01-30T15:11:47Z app[d8dd9e3f214398] ord [info]main.panicHandler({0xa0c0a0, 0xc0003b4ac0})
2025-01-30T15:11:47Z app[d8dd9e3f214398] ord [info]	/go/src/github.com/fly-apps/fly-postgres/cmd/start/main.go:190 +0x4a
2025-01-30T15:11:47Z app[d8dd9e3f214398] ord [info]main.main()
2025-01-30T15:11:47Z app[d8dd9e3f214398] ord [info]	/go/src/github.com/fly-apps/fly-postgres/cmd/start/main.go:67 +0xe65
2025-01-30T15:11:47Z app[d8dd9e3f214398] ord [info]2025/01/30 15:11:47 INFO SSH listening listen_address=[fdaa:a:ba15:a7b:2aa:5946:a3ac:2]:22 dns_server=[fdaa::3]:53
2025-01-30T15:11:48Z app[d8dd9e3f214398] ord [info] INFO Main child exited normally with code: 2
2025-01-30T15:11:48Z app[d8dd9e3f214398] ord [info] INFO Starting clean up.
2025-01-30T15:11:48Z app[d8dd9e3f214398] ord [info] INFO Umounting /dev/vdc from /data
2025-01-30T15:11:48Z app[d8dd9e3f214398] ord [info] WARN could not unmount /rootfs: EINVAL: Invalid argument
2025-01-30T15:11:48Z app[d8dd9e3f214398] ord [info][    1.997023] reboot: Restarting system
2025-01-30T15:12:02Z app[d8dd9e3f214398] ord [info]2025-01-30T15:12:02.888481831 [01JJVW6RNK8RQ9AM0DMN33QGE3:main] Running Firecracker v1.7.0
2025-01-30T15:12:03Z app[d8dd9e3f214398] ord [info] INFO Starting init (commit: 676c82a4)...
2025-01-30T15:12:03Z app[d8dd9e3f214398] ord [info] INFO Checking filesystem on /data
2025-01-30T15:12:03Z app[d8dd9e3f214398] ord [info]/dev/vdc: clean, 19/64512 files, 8792/258048 blocks
2025-01-30T15:12:03Z app[d8dd9e3f214398] ord [info] INFO Mounting /dev/vdc at /data w/ uid: 0, gid: 0 and chmod 0755
2025-01-30T15:12:03Z app[d8dd9e3f214398] ord [info] INFO Resized /data to 1056964608 bytes
2025-01-30T15:12:03Z app[d8dd9e3f214398] ord [info] INFO Preparing to run: `start` as root
2025-01-30T15:12:03Z app[d8dd9e3f214398] ord [info] INFO [fly api proxy] listening at /.fly/api
2025-01-30T15:12:03Z runner[d8dd9e3f214398] ord [info]Machine started in 1.153s
2025-01-30T15:12:04Z app[d8dd9e3f214398] ord [info]Provisioning standby
2025-01-30T15:12:04Z app[d8dd9e3f214398] ord [info]panic: failed to resolve member over dns: unable to resolve cloneable member
2025-01-30T15:12:04Z app[d8dd9e3f214398] ord [info]goroutine 1 [running]:
2025-01-30T15:12:04Z app[d8dd9e3f214398] ord [info]main.panicHandler({0xa0c0a0, 0xc00024cab0})
2025-01-30T15:12:04Z app[d8dd9e3f214398] ord [info]	/go/src/github.com/fly-apps/fly-postgres/cmd/start/main.go:190 +0x4a
2025-01-30T15:12:04Z app[d8dd9e3f214398] ord [info]main.main()
2025-01-30T15:12:04Z app[d8dd9e3f214398] ord [info]	/go/src/github.com/fly-apps/fly-postgres/cmd/start/main.go:67 +0xe65
2025-01-30T15:12:04Z app[d8dd9e3f214398] ord [info]2025/01/30 15:12:04 INFO SSH listening listen_address=[fdaa:a:ba15:a7b:2aa:5946:a3ac:2]:22 dns_server=[fdaa::3]:53
2025-01-30T15:12:04Z app[d8dd9e3f214398] ord [info] INFO Main child exited normally with code: 2
2025-01-30T15:12:04Z app[d8dd9e3f214398] ord [info] INFO Starting clean up.
2025-01-30T15:12:04Z app[d8dd9e3f214398] ord [info] INFO Umounting /dev/vdc from /data
2025-01-30T15:12:04Z app[d8dd9e3f214398] ord [info] WARN could not unmount /rootfs: EINVAL: Invalid argument
2025-01-30T15:12:04Z app[d8dd9e3f214398] ord [info][    1.985537] reboot: Restarting system
2025-01-30T15:12:16Z app[d8dd9e3f214398] ord [info]2025-01-30T15:12:16.043575691 [01JJVW6RNK8RQ9AM0DMN33QGE3:main] Running Firecracker v1.7.0
2025-01-30T15:12:16Z app[d8dd9e3f214398] ord [info] INFO Starting init (commit: 676c82a4)...
2025-01-30T15:12:17Z app[d8dd9e3f214398] ord [info] INFO Checking filesystem on /data
2025-01-30T15:12:17Z app[d8dd9e3f214398] ord [info]/dev/vdc: clean, 19/64512 files, 8792/258048 blocks
2025-01-30T15:12:17Z app[d8dd9e3f214398] ord [info] INFO Mounting /dev/vdc at /data w/ uid: 0, gid: 0 and chmod 0755
2025-01-30T15:12:17Z app[d8dd9e3f214398] ord [info] INFO Resized /data to 1056964608 bytes
2025-01-30T15:12:17Z app[d8dd9e3f214398] ord [info] INFO Preparing to run: `start` as root
2025-01-30T15:12:17Z app[d8dd9e3f214398] ord [info] INFO [fly api proxy] listening at /.fly/api
2025-01-30T15:12:17Z runner[d8dd9e3f214398] ord [info]Machine started in 1.181s
2025-01-30T15:12:17Z app[d8dd9e3f214398] ord [info]Provisioning standby
2025-01-30T15:12:17Z app[d8dd9e3f214398] ord [info]2025/01/30 15:12:17 INFO SSH listening listen_address=[fdaa:a:ba15:a7b:2aa:5946:a3ac:2]:22 dns_server=[fdaa::3]:53
2025-01-30T15:12:17Z app[d8dd9e3f214398] ord [info]panic: failed to resolve member over dns: unable to resolve cloneable member
2025-01-30T15:12:17Z app[d8dd9e3f214398] ord [info]goroutine 1 [running]:
2025-01-30T15:12:17Z app[d8dd9e3f214398] ord [info]main.panicHandler({0xa0c0a0, 0xc0003b4ac0})
2025-01-30T15:12:17Z app[d8dd9e3f214398] ord [info]	/go/src/github.com/fly-apps/fly-postgres/cmd/start/main.go:190 +0x4a
2025-01-30T15:12:17Z app[d8dd9e3f214398] ord [info]main.main()
2025-01-30T15:12:17Z app[d8dd9e3f214398] ord [info]	/go/src/github.com/fly-apps/fly-postgres/cmd/start/main.go:67 +0xe65
2025-01-30T15:12:18Z app[d8dd9e3f214398] ord [info] INFO Main child exited normally with code: 2
2025-01-30T15:12:18Z app[d8dd9e3f214398] ord [info] INFO Starting clean up.
2025-01-30T15:12:18Z app[d8dd9e3f214398] ord [info] INFO Umounting /dev/vdc from /data
2025-01-30T15:12:18Z app[d8dd9e3f214398] ord [info] WARN could not unmount /rootfs: EINVAL: Invalid argument
2025-01-30T15:12:18Z app[d8dd9e3f214398] ord [info][    2.022140] reboot: Restarting system

I cannot see your account, I don’t work for fly and this forum is not for official support, only community.

From the logs shared I’d be more concerned with that

panic: failed to resolve member over dns: unable to resolve cloneable member

anything weird checking machines IPs?
are all these in the same region?

you mentioned you created first a primary with only one replica and only later tried to add another replica so I guess you didn’t use fly postgres create (that offers you to create either a single node (Development) or 3 nodes (HA)? how did you create them? what commands are you running to try and create this new replica?

The initial database was created when I deployed from GitHub. I added the first replica from the dashboard.

I have since tried using clone from the fly command line with same results. All in the same region yes.

I have a feeling the only fix is to backup my database or snapshot make a new app-db with the scaling I want and restore the snapshot and then point the rails app to the new db.

I really feel like there is a backend bug happening here which I would like to report directly but even their paid support doesn’t offer help with Postgres right now and there are no other methods of contact.

The following, more detailed version does work, last I heard:

https://community.fly.io/t/urgency-problems-with-postgres-the-database-is-not-responding/19926/2

(Probably it should be the official documentation…)

You override with a new user name but the old database name.

2 Likes

Thanks for sharing that, that’s basically what I just ended up doing. I forked over my database to make a new app and attached to back to the main rails app.

1 Like