Postgres deployment suddenly happened and is stuck (ewr)

eric-karambit-ai · May 12, 2022, 8:57pm

Not sure why, but my Postgres app deployed in ewr suddenly stopped running today and doesn’t seem to be successfully redeploying.

I don’t see anything on status.fly.io. Is there anything I can do to manually fix the deployment?

eric-karambit-ai · May 12, 2022, 10:14pm

the Postgres application has just been stuck like this

eric-karambit-ai · May 12, 2022, 10:14pm

I tried changing scale settings to change the deployment version but the version number ticks up without changing the deployment status

kurt · May 12, 2022, 10:34pm

We’re looking at this one. It looks like it exited for some reason (could have a been an OOM or similar) and it hasn’t been able to boot back up successfully. It should be up soon.

We highly recommend running two postgres nodes if you max db availability. It would have continued functioning just fine if a second node was around to take over when the first failed.

eric-karambit-ai · May 12, 2022, 10:36pm

Is there an easy way/guide for me to change the current app to be multi-instance?

I’ve wanted to set that up for a while but had prioritized other things. Sometimes the right time just comes at you fast

kurt · May 12, 2022, 10:39pm

Yeah! Once it’s back up and healthy (make sure it’s doing what you want), run:

fly volumes create pg_data --size 10 --region ewr --no-encryption -a <db-name>
fly scale count 2 -a <db-name>

eric-karambit-ai · May 12, 2022, 10:43pm

So my current volume is not encrypted. Would there be an issue with creating new encrypted volumes and eventually sunsetting the original unencrypted one?

Appreciate your help as always @kurt. Also please feel free to point me to the right places in the docs. Happy to RTFM

eric-karambit-ai · May 13, 2022, 12:22am

I’ve gone through the commands you showed @kurt but don’t see evidence that this setup is clustered.

The first image is the deployment for the prod database and the second is for our staging database which was set up recently.

The first one doesn’t have (leader) or (replica) present in the status.

kurt · May 13, 2022, 12:33am

Oh, huh, this might be a Postgres standalone. We were deploying single instance postgres that couldn’t have replicas for a short time. What do you see when you run fly image show?

eric-karambit-ai · May 13, 2022, 12:38am

Deployment Status
  Registry   = registry-1.docker.io                                                     
  Repository = flyio/postgres-standalone                                                
  Tag        = 14.1                                                                     
  Version    = v0.0.7                                                                   
  Digest     = sha256:ca27c53b81cae713e67d7ced87a4289961db4a81e382b09aaf42ea53032791eb

kurt · May 13, 2022, 12:39am

Ok, we’re going to look at how to turn this into a clustered install. Go ahead and remove that volume you created (fly volumes list should show the newest, then fly volumes delete <id>).

We’ll try and get back to you by tomorrow!

eric-karambit-ai · May 13, 2022, 12:40am

Fantastic, thank you!

eric-karambit-ai · October 28, 2022, 11:05pm

Hi @kurt. Hadn’t gotten around to doing anything about this since but was unfortunately impacted by today’s host failure in EWR with my standalone Postgres instance.

Do you have any advice on how I could make it a clustered install?

eric-karambit-ai · October 29, 2022, 1:10pm

Also my Postgres instance never recovered from the outage and I’m not sure how to fix it. I’ve tried scaling to 0 and scaling up to 1 again but the errors persist.

2022-10-29T13:08:42.149 app[d87abda2] ewr [info] Failed to create required users: failed to connect to `host=fdaa:0:430c:a7b:ab2:0:726c:2 user=postgres database=postgres`: dial error (dial tcp [fdaa:0:430c:a7b:ab2:0:726c:2]:5432: connect: connection refused)

2022-10-29T13:08:42.715 app[d87abda2] ewr [info] exporter | ERRO[0006] Error opening connection to database (postgresql://flypgadmin:PASSWORD_REMOVED@[fdaa:0:430c:a7b:ab2:0:726c:2]:5432/postgres?sslmode=disable): dial tcp [fdaa:0:430c:a7b:ab2:0:726c:2]:5432: connect: connection refused source="postgres_exporter.go:1658"

2022-10-29T13:08:43.149 app[d87abda2] ewr [info] Failed to create required users: failed to connect to `host=fdaa:0:430c:a7b:ab2:0:726c:2 user=postgres database=postgres`: dial error (dial tcp [fdaa:0:430c:a7b:ab2:0:726c:2]:5432: connect: connection refused)

Topic		Replies	Views
App not deploying -- Deployment stuck in "Pending", no VMs being allocated	10	415	September 23, 2022
Postgres instance suddenly down?	17	642	April 10, 2021
Postgres stuck in pending state after running image update Questions / Help	11	1474	July 12, 2022
How to fix postgres on pending state? postgres	7	756	October 25, 2022
Filesystem troubles with Postgres instance after outage Questions / Help postgres	1	253	October 29, 2022

Postgres deployment suddenly happened and is stuck (ewr)

Related topics