`fly postgres create` is unresponsive and hangs

$ fly version
  fly v0.0.429 darwin/arm64 Commit: 6db701e0 BuildDate: 2022-11-04T13:45:42Z

I’m creating a managed Postgres instance, and am running the command below:

fly postgres create --region sin \
  --name foo \
  --vm-size shared-cpu-1x \
  --volume-size 1 \
  --initial-cluster-size 1

This does not complete, and eventually hangs.
The immediate output before hanging is:

automatically selected personal organization: <my name>
Creating postgres cluster in organization personal
Creating app...
Setting secrets on app foo..Provisioning 1 of 1 machines with image flyio/postgres:14.4

After a ^C, I can see that the DB is created as the corresponding app is now listed on my Fly.io dashboard, but I don’t get a chance to see the values of the secrets associated with the database, so I can’t verify connectivity.

I tried destroying that and recreating with LOG_LEVEL=debug, and the final request made before a hang is:

DEBUG --> POST http://[fdaa:0:b744::3]:4280/v1/apps/foo/machines

{
  "appId": "foo",
  ...
}
DEBUG {0x140011bdb00}

I remember that this was working great in v0.0.427 when I was using it a few weeks ago, so I’m not sure if something’s changed with how it’s used now, or if something is legitimately broken in the system at this time.

The same things happens to me as well. This is very frustrating tbh. I thought using Fly is supposed to simply the process.

(venv) ⚡ web [master] fly version
fly v0.0.437 darwin/amd64 Commit: 7073cc94 BuildDate: 2022-12-01T19:56:44Z

I’m having the same issue as @monological . I’m deploying a cluster of 2, the first machine seems to be deployed but the second one hangs on “Waiting for machine to start”

Screenshot_2022-12-08_13-33-00

UPDATE: launching a dev environment (one shared machine) worked.

Launching a cluster of 2 shared-cpu-1x also worked (it hung waiting for an instance to become healthy but eventually finished).

Launching a cluster of 2 shared-cpu-2x just worked. I guess there was some internal issue that got resolved.

Came back to give this another red hot go in case anything’s changed, but flyctl now complains that my account has been auto-flagged by the fraud system. Oh well. :sweat_smile:

fly postgres create is completely broken for me too, and hangs indefinitely on the first of three health checks:

$ fly postgres create
? Choose an app name (leave blank to generate one): my-db
automatically selected personal organization: David Celis
? Select region: Seattle, Washington (US) (sea)
? Select configuration: Development - Single node, 1x shared CPU, 256MB RAM, 1GB disk
Creating postgres cluster in organization personal
Creating app...
Setting secrets on app nook-stop-db...
Provisioning 1 of 1 machines with image flyio/postgres:14.6
Waiting for machine to start...
Machine 3d8dd57ce21738 is created
==> Monitoring health checks
  Waiting for 3d8dd57ce21738 to become healthy (started, 0/3)

If I ^C out of this, I’ll get some more output along with credentials that makes it seem like my database actually was created…

==> Monitoring health checks
  Waiting for 3d8dd57ce21738 to become healthy (started, 0/3)
^C
Postgres cluster my-db created
  Username:    postgres
  Password:    <snip>
  Hostname:    nook-stop-db.internal
  Proxy port:  5432
  Postgres port:  5433
  Connection string: postgres://postgres:<snip>@my-db.internal:5432

Save your credentials in a secure place -- you won't be able to see them again!

Connect to postgres
Any app within the David Celis organization can connect to this Postgres using the above connection string

Now that you've set up Postgres, here's what you need to understand: https://fly.io/docs/postgres/getting-started/what-you-should-know/

… but attempting to do anything with it will give me 500 errors:

$ fly postgres attach my-db -a my-db
Error machines could not be retrieved failed to list VMs: request returned non-2xx status, 500

Hi @davidcelis, we’re rolling out a fix related to querying health checks now which is likely affecting the deployment.

Aha, thanks! I’ll try again later, then.

I just checked and the fix has been deployed to the host where the machine for nook-stop-db was deployed so you should be good now.

1 Like