Postgres deployment suddenly happened and is stuck (ewr)

Not sure why, but my Postgres app deployed in ewr suddenly stopped running today and doesn’t seem to be successfully redeploying.

I don’t see anything on status.fly.io. Is there anything I can do to manually fix the deployment?

the Postgres application has just been stuck like this

I tried changing scale settings to change the deployment version but the version number ticks up without changing the deployment status

We’re looking at this one. It looks like it exited for some reason (could have a been an OOM or similar) and it hasn’t been able to boot back up successfully. It should be up soon.

We highly recommend running two postgres nodes if you max db availability. It would have continued functioning just fine if a second node was around to take over when the first failed.

Is there an easy way/guide for me to change the current app to be multi-instance?

I’ve wanted to set that up for a while but had prioritized other things. Sometimes the right time just comes at you fast :slight_smile:

Yeah! Once it’s back up and healthy (make sure it’s doing what you want), run:

fly volumes create pg_data --size 10 --region ewr --no-encryption -a <db-name>
fly scale count 2 -a <db-name>
1 Like

So my current volume is not encrypted. Would there be an issue with creating new encrypted volumes and eventually sunsetting the original unencrypted one?

Appreciate your help as always @kurt. Also please feel free to point me to the right places in the docs. Happy to RTFM

I’ve gone through the commands you showed @kurt but don’t see evidence that this setup is clustered.

The first image is the deployment for the prod database and the second is for our staging database which was set up recently.

The first one doesn’t have (leader) or (replica) present in the status.

Oh, huh, this might be a Postgres standalone. We were deploying single instance postgres that couldn’t have replicas for a short time. What do you see when you run fly image show?

Deployment Status
  Registry   = registry-1.docker.io                                                     
  Repository = flyio/postgres-standalone                                                
  Tag        = 14.1                                                                     
  Version    = v0.0.7                                                                   
  Digest     = sha256:ca27c53b81cae713e67d7ced87a4289961db4a81e382b09aaf42ea53032791eb  

Ok, we’re going to look at how to turn this into a clustered install. Go ahead and remove that volume you created (fly volumes list should show the newest, then fly volumes delete <id>).

We’ll try and get back to you by tomorrow!

Fantastic, thank you!

Hi @kurt. Hadn’t gotten around to doing anything about this since but was unfortunately impacted by today’s host failure in EWR with my standalone Postgres instance.

Do you have any advice on how I could make it a clustered install?

Also my Postgres instance never recovered from the outage and I’m not sure how to fix it. I’ve tried scaling to 0 and scaling up to 1 again but the errors persist.

2022-10-29T13:08:42.149 app[d87abda2] ewr [info] Failed to create required users: failed to connect to `host=fdaa:0:430c:a7b:ab2:0:726c:2 user=postgres database=postgres`: dial error (dial tcp [fdaa:0:430c:a7b:ab2:0:726c:2]:5432: connect: connection refused)

2022-10-29T13:08:42.715 app[d87abda2] ewr [info] exporter | ERRO[0006] Error opening connection to database (postgresql://flypgadmin:PASSWORD_REMOVED@[fdaa:0:430c:a7b:ab2:0:726c:2]:5432/postgres?sslmode=disable): dial tcp [fdaa:0:430c:a7b:ab2:0:726c:2]:5432: connect: connection refused source="postgres_exporter.go:1658"

2022-10-29T13:08:43.149 app[d87abda2] ewr [info] Failed to create required users: failed to connect to `host=fdaa:0:430c:a7b:ab2:0:726c:2 user=postgres database=postgres`: dial error (dial tcp [fdaa:0:430c:a7b:ab2:0:726c:2]:5432: connect: connection refused)