An unknown error occurred re postgres

I scaled all my apps down a few days ago as development had paused and I had some other work to do. This included a postgres app. I now went back to it and am trying to redeploy, but running into the following opaque errors. Any idea where I should follow to figure this out? My deploy script works np usually.

+ docker run -i --rm -v /.../<appname>:/.../<appname>:ro -v /var/run/docker.sock:/var/run/docker.sock -v /.../.fly:/.fly -w /.../<appname> --network host --env FLY_NO_UPDATE_CHECK=1 --env FLY_API_TOKEN --env BUILDKIT_PROGRESS=plain flyio/flyctl:v0.0.259 postgres db list rpt-rinkeby-postgres
Error An unknown error occured.


+ flyctl postgres detach --postgres-app rpt-rinkeby-postgres -a rpt-rinkeby-graph
++ pwd
++ pwd
++ pwd
Detaching...⣻ Error This version of PG Detach has been removed. Please run `flyctl version update`.
WWWWWWWWWWWWWW
WWWWWWWWWWWWWW
+ true
+ flyctl postgres --verbose attach --postgres-app rpt-rinkeby-postgres -a rpt-rinkeby-graph --database-name graph_chain_3c5acd85
++ pwd
++ pwd
++ pwd
Attaching...⣻ Error An unknown error occured.

When you say scaled down … do you mean scaled them down to zero? By doing something like fly scale count 0?

If so … it’s possible that error is due to there being no vms. Because the number of allowed vms is set at 0. So there is no vm to attach to, deploy to, etc.

If that’s the case, try fly scale count 1 (or however many vms you want). And wait a minute. And check using fly status that there is now 1 (or however many) vms listed as being running.

Then you can retry the attach command, or any other, and that may have fixed it.

Else it’s not that.

Yes, I meant that I scaled them down to 0 with fly scale count 0.

I did though then scale them back up. I notably forgot to scale back up the postgres app before trying to deploy, realized what I did, scaled it up, and then tried again. That’s when the error kicked in. I now can’t get this to work at all and it’s frustrating. I think I’m just going to delete these apps and rebuild them.

Well, this is annoying. So I deleted everything and restarted them, but now it appears that two of the five applications are just not responding. Requesting logs isn’t getting anything either. Going to try and do this again…

EDIT: They seem to be stuck in pending. The material consequence of this is lots of these:

deploy-flyctl-daemon-1  | 2022/04/11 15:47:34 [1] err handling resolve: top1.nearest.of.rpt-rinkeby-ipfs.internal:5001 - no such host
deploy-flyctl-daemon-1  | 2022/04/11 15:47:34 [1] incoming command: [resolve personal top1.nearest.of.rpt-rinkeby-graph.internal:8020]
deploy-flyctl-daemon-1  | 2022/04/11 15:47:34 [1] incoming command: [resolve personal top1.nearest.of.rpt-rinkeby-ipfs.internal:5001]
deploy-flyctl-daemon-1  | 2022/04/11 15:47:34 [1] err handling resolve: top1.nearest.of.rpt-rinkeby-graph.internal:8020 - no such host

I wonder if the problem is actually postgres. I just blew everything up again, including the api node I had kept going, and then started over. The first part is making a postgres db and that seems to be stalled. After running the fly postgres create command, I am getting the following stalled output.

Postgres cluster rpt-rinkeby-postgres created                                                                                                                                                                                                   Username:    postgres                                                                                                                                                                                                                         Password:    <password>                                                                                                                                                                           Hostname:    <hostname>                                                                                                                                                                                                    Proxy Port:  <port>                                                                                                                                                                                                                             PG Port: <port>                                                                                                                                                                                                                               Save your credentials in a secure place, you won't be able to see them again!                                                                                                                                                                 WWWWWWWWWWWWWW                                                                                                                                                                                                                                Monitoring Deployment                                                                                                                                                                                                                         WWWWWWWWWWWWWW                                                                                                                                                                                                                                1 desired, 1 placed, 0 healthy, 0 unhealthy

But it’s not finishing and the online interface is just showing pending:

Is the postgres create part down?

Looks like @kurt has responded to some similar errors in the past. I don’t think postgres create is working at all right now…

Hmm. That doesn’t sound good.

You could try these just to see if they show anything new:

fly status --app database-name
fly volumes list --app database-name

They should show the status of the database app (the vm bit) and the volumes it uses to store the data. Both would need to be ok and it certainly sounds like one or both is not. Based on the ‘pending’ and logs output.

Unfortunately this’ll take someone at Fly to investigate. As you say, Kurt (or someone) can usually sort this.

@cinjon

Looks like there’s an issue with the host your volume is on. We are actively working on this and will keep you posted.

1 Like

@cinjon Looks like things should be settled now. Can you confirm things are looking better on your end?

Apologies if this is an unrelated issue, but I attempted to scale the ram on a postgres application a few hours ago and the deployment remained stuck in pending state for about 15 minutes. I assumed something had silently failed, so I scaled the pg isntance count to 0 and then back up to 1, and now postgres appears to be stuck in a crashed state. Attempts to stop or restart the vm via flyctl also result in a similar An unknown error occured to the OP. So far nothing I’ve tried has restored the postgres instance to a functional state. Hopefully that’s helpful for debugging this @shaun.

Looks like it … redeploying now.

I chatted with @shaun and @kurt outside of this thread and this was a combination of a UI bug when scaling postgres ram from the web UI (for now, do this from the CLI) and an internal provisioning issue. Per @kurt, if it takes more than 2 minutes to scale ram on an application, reach out to the team. Thanks all. Sorry for hijacking your thread @cinjon :slight_smile:

Got it, thanks for the update and being on top of this! Much appreciate your help all :slight_smile:

Hi, it looks like we’re getting this error again. This is urgent because we are unable to use our postgres database. Did I trigger the same/similar bug on your end?

+ flyctl postgres db list rpt-rinkeby-postgres
+ grep graph_chain_29291a4f
++ pwd
++ pwd
++ pwd
+ docker run -i --rm -v /path.../:/path:ro -v /var/run/docker.sock:/var/run/docker.sock -v /path/.fly:/.fly -w /path --network host --env FLY_NO_UPDATE_CHECK=1 --env FLY_API_TOKEN --env BUILDKIT_PROGRESS=plain flyio/flyctl:v0.0.259 postgres db list rpt-rinkeby-postgres
Error An unknown error occured.


+ flyctl postgres detach --postgres-app rpt-rinkeby-postgres -a rpt-rinkeby-graph
++ pwd
++ pwd
++ pwd
Detaching...⣻ Error This version of PG Detach has been removed. Please run `flyctl version update`.                                                                                                                                                                           WWWWWWWWWWWWWW                                                                                                                                                                                                                                                                WWWWWWWWWWWWWW
Attaching...⣷ Error An unknown error occuredkill 0WWWWWW    

@cinjon Can you update flyctl and try again? flyctl version update

I just did that and upgraded to flyctl v0.0.320 darwin/amd64 Commit: 58aae1e BuildDate: 2022-04-13T15:58:08Z.

Same errors around postgres detach, etc. Is this possibly because I’m using v0.0.259 in the build commands? I’m hesitant to change that because I am unfamiliar with what has advanced between these versions and the app runs on this.

I went ahead and tried it anyways, changing all instances of 259 in the docker setup to 320 per the releases shown here: Docker Hub.

This did not work and failed with an error that is hard to make use of:

deployment-1650116695: digest: sha256:7edc1f33335d48f551cf70586d2f43a2f623817960b1f627ffb5cda3ae22286a size: 3037
--> Pushing image done
image: registry.fly.io/rpt-rinkeby-ipfs:deployment-1650116695
image size: 83 MB
==> Creating release
Error An unknown error occured.


+ kill 0

Well, that’s disappointing. And you’re right, that unknown error message is not enlightening. We’ll need further input from someone with more insight into this context.

Keeping flyctl up-to-date is necessary, though: flyctl commands are how you interact with our API, so you need the flyctl version that knows the latest API changes. This was an example:

Error This version of PG Detach has been removed. Please run `flyctl version update`.

Any update on this? Maybe @shaun or @kurt?

flyctl needs to be current to work with postgres. Postgres operations are direct flyctl -> postgres vm commands. These changed recently so that’s what’s breaking.