Postgres DB is not reachable

Everything was running fine from the last 4 months. In the morning today, I get this issue.

healthcheck ❌
error: PrismaClientKnownRequestError: Can't reach database server at `maa.top2.nearest.of.letterhive-db.internal`:`5432`
Please make sure your database server is running at `maa.top2.nearest.of.letterhive-db.internal`:`5432`.

I tried connecting to the instance but it says

Connecting to letterhive-db.internal...⣾ Error error connecting to SSH server: dial: lookup letterhive-db.internal. on fdaa:0:71d2::3: no such host

Then I thought a redeploy would fix it. So I tried upgrading the postgres instance. But it is stuck at pending with an error now.

? Update `letterhive-db` from flyio/postgres:14.3 v0.0.23 to flyio/postgres:14.4 v0.0.32? Yes
Release v1 created

You can detach the terminal anytime without stopping the update
==> Monitoring deployment

 1 desired, 1 placed, 0 healthy, 1 unhealthy
--> v1 failed - Failed due to unhealthy allocations and deploying as v2 

--> Troubleshooting guide at https://fly.io/docs/getting-started/troubleshooting/
Error abort

I can’t figure out the reason. Please help.

We are also facing the same issue on our staging and production app which is using the same region. (Chennai - maa).

ActiveRecord::ConnectionNotEstablished
could not translate host name “top2.nearest.of.miru-staging-db.internal” to address: Temporary failure in name resolution.

Fly team, please give a resolution for this.

It seems to resolve now. Can you check if it works for you as well @akhilgkrishnan ?

Its still not working for me

@akhilgkrishnan Hmmm, my guess is there’s a DNS issue somewhere, based on the temporary failure in name resolution error. Is your db listening on IPV6 for the correct port? Can you share your fly.toml settings?

@shortdiv db is using maa region. Can you check this ASAP.

# fly.toml file generated for miru-web on 2022-10-05T10:02:38+05:30

app = "miru-web"
kill_signal = "SIGINT"
kill_timeout = 5
processes = []

[build]
  [build.args]
    BUNDLER_VERSION = "2.3.11"
    NODE_VERSION = "16.4.2"
    RUBY_VERSION = "3.1.2"

[deploy]
  release_command = ""

[env]
  PORT = "8080"
  SERVER_COMMAND = "bin/rails fly:server"
  APP_BASE_URL = "https://miru-web.fly.dev"

[experimental]
  allowed_public_ports = []
  auto_rollback = true

[[services]]
  http_checks = []
  internal_port = 8080
  processes = ["web"]
  protocol = "tcp"
  script_checks = []
  [services.concurrency]
    hard_limit = 25
    soft_limit = 20
    type = "connections"

  [[services.ports]]
    force_https = true
    handlers = ["http"]
    port = 80

  [[services.ports]]
    handlers = ["tls", "http"]
    port = 443

  [[services.tcp_checks]]
    grace_period = "30s"
    interval = "15s"
    restart_limit = 0
    timeout = "4s"

[[statics]]
  guest_path = "/app/public"
  url_prefix = "/"

[processes]
  web = "bundle exec puma -C config/puma.rb"
  worker = "bundle exec sidekiq -e production -C config/sidekiq.yml"```

Hmmm, can you ssh into the db instance? Try running fly ssh console

@shortdiv I can enter into the console

DB Log

022-10-21T06:15:51.694 app[6b06793b] maa [info] sentinel | goroutine 2086 [running]:

2022-10-21T06:15:51.694 app[6b06793b] maa [info] sentinel | github.com/superfly/leadership.(*Candidate).initLock(0xc000136c40)

2022-10-21T06:15:51.694 app[6b06793b] maa [info] sentinel | /go/pkg/mod/github.com/superfly/leadership@v0.2.1/candidate.go:98 +0x2e

2022-10-21T06:15:51.694 app[6b06793b] maa [info] sentinel | github.com/superfly/leadership.(*Candidate).campaign(0xc000136c40)

2022-10-21T06:15:51.694 app[6b06793b] maa [info] sentinel | /go/pkg/mod/github.com/superfly/leadership@v0.2.1/candidate.go:124 +0xc6

2022-10-21T06:15:51.694 app[6b06793b] maa [info] sentinel | created by github.com/superfly/leadership.(*Candidate).RunForElection

2022-10-21T06:15:51.694 app[6b06793b] maa [info] sentinel | /go/pkg/mod/github.com/superfly/leadership@v0.2.1/candidate.go:60 +0xc5

2022-10-21T06:15:51.696 app[6b06793b] maa [info] sentinel | exit status 2

2022-10-21T06:15:51.696 app[6b06793b] maa [info] sentinel | restarting in 3s [attempt 1]

2022-10-21T06:15:54.696 app[6b06793b] maa [info] sentinel | Running...

2022-10-21T06:24:03.689 app[a00f4f14] maa [info] sentinel | 2022-10-21T06:24:03.689Z ERROR cmd/sentinel.go:1852 error retrieving cluster data {"error": "Unexpected response code: 500"}

2022-10-21T06:24:03.697 app[6b06793b] maa [info] sentinel | 2022-10-21T06:24:03.697Z ERROR cmd/sentinel.go:1889 cannot update sentinel info {"error": "Unexpected response code: 500 (rpc error making call: node is not the leader)"}

2022-10-21T06:24:10.786 app[6b06793b] maa [info] keeper | 2022-10-21T06:24:10.786Z ERROR cmd/keeper.go:870 failed to update keeper info {"error": "Unexpected response code: 500 (No cluster leader)"}

2022-10-21T06:40:02.704 app[a00f4f14] maa [info] keeper | 2022-10-21T06:40:02.703Z ERROR cmd/keeper.go:870 failed to update keeper info {"error": "Unexpected response code: 500 (leadership lost while committing log)"}

2022-10-21T06:40:03.187 app[6b06793b] maa [info] keeper | 2022-10-21T06:40:03.186Z ERROR cmd/keeper.go:1041 error retrieving cluster data {"error": "Unexpected response code: 500"}

2022-10-21T06:40:03.231 app[6b06793b] maa [info] sentinel | 2022-10-21T06:40:03.229Z ERROR cmd/sentinel.go:1889 cannot update sentinel info {"error": "Unexpected response code: 500 (rpc error making call: leadership lost while committing log)"}

2022-10-21T06:40:03.310 app[6b06793b] maa [info] keeper | 2022-10-21T06:40:03.310Z ERROR cmd/keeper.go:870 failed to update keeper info {"error": "Unexpected response code: 500 (rpc error making call: leadership lost while committing log)"}

2022-10-21T07:01:23.312 app[6b06793b] maa [info] keeper | 2022-10-21T07:01:23.312Z ERROR cmd/keeper.go:1041 error retrieving cluster data {"error": "Unexpected response code: 500"}

2022-10-21T07:01:23.358 app[a00f4f14] maa [info] sentinel | 2022-10-21T07:01:23.358Z ERROR cmd/sentinel.go:1852 error retrieving cluster data {"error": "Unexpected response code: 500"}

2022-10-21T07:01:23.392 app[a00f4f14] maa [info] keeper | 2022-10-21T07:01:23.392Z ERROR cmd/keeper.go:870 failed to update keeper info {"error": "Unexpected response code: 500 (rpc error making call: node is not the leader)"}

2022-10-21T07:23:27.642 app[6b06793b] maa [info] keeper | 2022-10-21T07:23:27.642Z ERROR cmd/keeper.go:870 failed to update keeper info {"error": "Unexpected response code: 500 (rpc error making call: leadership lost while committing log)"}

2022-10-21T07:23:27.756 app[6b06793b] maa [info] sentinel | 2022-10-21T07:23:27.756Z ERROR cmd/sentinel.go:1947 error saving clusterdata {"error": "Unexpected response code: 500 (rpc error making call: leadership lost while committing log)"}

2022-10-21T07:53:53.110 app[6b06793b] maa [info] keeper | 2022-10-21T07:53:53.110Z ERROR cmd/keeper.go:870 failed to update keeper info {"error": "Unexpected response code: 500 (leadership lost while committing log)"}

2022-10-21T07:53:53.513 app[a00f4f14] maa [info] keeper | 2022-10-21T07:53:53.513Z ERROR cmd/keeper.go:1041 error retrieving cluster data {"error": "Unexpected response code: 500"}

2022-10-21T07:54:01.072 app[a00f4f14] maa [info] sentinel | 2022-10-21T07:54:01.071Z ERROR cmd/sentinel.go:102 election loop error {"error": "failed to read lock: Unexpected response code: 500"}

2022-10-21T08:43:01.102 app[6b06793b] maa [info] keeper | 2022-10-21T08:43:01.101Z ERROR cmd/keeper.go:1041 error retrieving cluster data {"error": "Unexpected response code: 500"}

2022-10-21T08:43:01.266 app[a00f4f14] maa [info] keeper | 2022-10-21T08:43:01.265Z ERROR cmd/keeper.go:1041 error retrieving cluster data {"error": "Unexpected response code: 500"}

2022-10-21T08:43:01.729 app[a00f4f14] maa [info] sentinel | 2022-10-21T08:43:01.729Z ERROR cmd/sentinel.go:1889 cannot update sentinel info {"error": "Unexpected response code: 500 (leadership lost while committing log)"}

2022-10-21T08:43:07.805 app[a00f4f14] maa [info] sentinel | 2022-10-21T08:43:07.805Z ERROR cmd/sentinel.go:102 election loop error {"error": "failed to read lock: Unexpected response code: 500"}

2022-10-21T08:43:09.009 app[6b06793b] maa [info] sentinel | 2022-10-21T08:43:09.008Z ERROR cmd/sentinel.go:102 election loop error {"error": "Unexpected response code: 500 (No cluster leader)"}

2022-10-21T08:43:33.066 app[a00f4f14] maa [info] sentinel | 2022-10-21T08:43:33.066Z ERROR cmd/sentinel.go:1947 error saving clusterdata {"error": "Unexpected response code: 500 (rpc error making call: leadership lost while committing log)"}

2022-10-21T08:43:33.198 app[a00f4f14] maa [info] keeper | 2022-10-21T08:43:33.197Z ERROR cmd/keeper.go:1041 error retrieving cluster data {"error": "Unexpected response code: 500"}

2022-10-21T08:43:33.378 app[6b06793b] maa [info] keeper | 2022-10-21T08:43:33.377Z ERROR cmd/keeper.go:1041 error retrieving cluster data {"error": "Unexpected response code: 500"}

2022-10-21T08:43:33.398 app[6b06793b] maa [info] sentinel | 2022-10-21T08:43:33.397Z ERROR cmd/sentinel.go:1852 error retrieving cluster data {"error": "Unexpected response code: 500"}

2022-10-21T08:43:34.710 app[a00f4f14] maa [info] sentinel | panic: close of closed channel

2022-10-21T08:43:34.710 app[a00f4f14] maa [info] sentinel |

2022-10-21T08:43:34.710 app[a00f4f14] maa [info] sentinel | goroutine 6819 [running]:

2022-10-21T08:43:34.710 app[a00f4f14] maa [info] sentinel | github.com/superfly/leadership.(*Candidate).initLock(0xc0000ae000)

2022-10-21T08:43:34.710 app[a00f4f14] maa [info] sentinel | /go/pkg/mod/github.com/superfly/leadership@v0.2.1/candidate.go:98 +0x2e

2022-10-21T08:43:34.710 app[a00f4f14] maa [info] sentinel | github.com/superfly/leadership.(*Candidate).campaign(0xc0000ae000)

2022-10-21T08:43:34.710 app[a00f4f14] maa [info] sentinel | /go/pkg/mod/github.com/superfly/leadership@v0.2.1/candidate.go:124 +0xc6

2022-10-21T08:43:34.710 app[a00f4f14] maa [info] sentinel | created by github.com/superfly/leadership.(*Candidate).RunForElection

2022-10-21T08:43:34.710 app[a00f4f14] maa [info] sentinel | /go/pkg/mod/github.com/superfly/leadership@v0.2.1/candidate.go:60 +0xc5

2022-10-21T08:43:34.711 app[a00f4f14] maa [info] sentinel | exit status 2

2022-10-21T08:43:34.711 app[a00f4f14] maa [info] sentinel | restarting in 3s [attempt 42]

2022-10-21T08:43:37.712 app[a00f4f14] maa [info] sentinel | Running...

2022-10-21T08:43:42.790 app[6b06793b] maa [info] sentinel | panic: close of closed channel

2022-10-21T08:43:42.790 app[6b06793b] maa [info] sentinel |

2022-10-21T08:43:42.790 app[6b06793b] maa [info] sentinel | goroutine 5778 [running]:

2022-10-21T08:43:42.790 app[6b06793b] maa [info] sentinel | github.com/superfly/leadership.(*Candidate).initLock(0xc0000ae000)

2022-10-21T08:43:42.790 app[6b06793b] maa [info] sentinel | /go/pkg/mod/github.com/superfly/leadership@v0.2.1/candidate.go:98 +0x2e

2022-10-21T08:43:42.790 app[6b06793b] maa [info] sentinel | github.com/superfly/leadership.(*Candidate).campaign(0xc0000ae000)

2022-10-21T08:43:42.790 app[6b06793b] maa [info] sentinel | /go/pkg/mod/github.com/superfly/leadership@v0.2.1/candidate.go:124 +0xc6

2022-10-21T08:43:42.790 app[6b06793b] maa [info] sentinel | created by github.com/superfly/leadership.(*Candidate).RunForElection

2022-10-21T08:43:42.790 app[6b06793b] maa [info] sentinel | /go/pkg/mod/github.com/superfly/leadership@v0.2.1/candidate.go:60 +0xc5

2022-10-21T08:43:42.792 app[6b06793b] maa [info] sentinel | exit status 2

2022-10-21T08:43:42.792 app[6b06793b] maa [info] sentinel | restarting in 3s [attempt 2]

2022-10-21T08:43:45.793 app[6b06793b] maa [info] sentinel | Running...

2022-10-21T09:48:26.669 app[a00f4f14] maa [info] sentinel | 2022-10-21T09:48:26.669Z WARN cmd/sentinel.go:276 no keeper info available {"db": "250e7205", "keeper": "14bf234272"}

2022-10-21T09:50:50.280 app[a00f4f14] maa [info] keeper | 2022-10-21T09:50:50.279Z ERROR cmd/keeper.go:870 failed to update keeper info {"error": "Unexpected response code: 500 (leadership lost while committing log)"}

2022-10-21T09:50:51.104 app[a00f4f14] maa [info] sentinel | 2022-10-21T09:50:51.104Z ERROR cmd/sentinel.go:1852 error retrieving cluster data {"error": "Unexpected response code: 500"}

2022-10-21T09:51:30.572 app[a00f4f14] maa [info] sentinel | 2022-10-21T09:51:30.572Z ERROR cmd/sentinel.go:1889 cannot update sentinel info {"error": "Unexpected response code: 500 (rpc error making call: leadership lost while committing log)"}

2022-10-21T09:51:37.486 app[6b06793b] maa [info] sentinel | 2022-10-21T09:51:37.485Z ERROR cmd/sentinel.go:102 election loop error {"error": "Unexpected response code: 500 (No cluster leader)"}

2022-10-21T09:53:34.554 app[6b06793b] maa [info] keeper | 2022-10-21T09:53:34.554Z ERROR cmd/keeper.go:870 failed to update keeper info {"error": "Unexpected response code: 500 (leadership lost while committing log)"}

2022-10-21T09:53:34.563 app[a00f4f14] maa [info] keeper | 2022-10-21T09:53:34.559Z ERROR cmd/keeper.go:870 failed to update keeper info {"error": "Unexpected response code: 500 (leadership lost while committing log)"}

2022-10-21T09:53:34.615 app[a00f4f14] maa [info] sentinel | 2022-10-21T09:53:34.615Z ERROR cmd/sentinel.go:1852 error retrieving cluster data {"error": "Unexpected response code: 500"}

2022-10-21T09:53:34.728 app[6b06793b] maa [info] sentinel | 2022-10-21T09:53:34.728Z ERROR cmd/sentinel.go:1852 error retrieving cluster data {"error": "Unexpected response code: 500"}

2022-10-21T09:53:34.741 app[6b06793b] maa [info] keeper | 2022-10-21T09:53:34.740Z ERROR cmd/keeper.go:1041 error retrieving cluster data {"error": "Unexpected response code: 500"}

2022-10-21T09:53:42.141 app[a00f4f14] maa [info] sentinel | 2022-10-21T09:53:42.141Z ERROR cmd/sentinel.go:102 election loop error {"error": "Unexpected response code: 500 (No cluster leader)"}

2022-10-21T09:54:01.543 app[6b06793b] maa [info] sentinel | panic: close of closed channel

2022-10-21T09:54:01.543 app[6b06793b] maa [info] sentinel |

2022-10-21T09:54:01.543 app[6b06793b] maa [info] sentinel | goroutine 3244 [running]:

2022-10-21T09:54:01.543 app[6b06793b] maa [info] sentinel | github.com/superfly/leadership.(*Candidate).initLock(0xc0000ae000)

2022-10-21T09:54:01.543 app[6b06793b] maa [info] sentinel | /go/pkg/mod/github.com/superfly/leadership@v0.2.1/candidate.go:98 +0x2e

2022-10-21T09:54:01.543 app[6b06793b] maa [info] sentinel | github.com/superfly/leadership.(*Candidate).campaign(0xc0000ae000)

2022-10-21T09:54:01.543 app[6b06793b] maa [info] sentinel | /go/pkg/mod/github.com/superfly/leadership@v0.2.1/candidate.go:124 +0xc6

2022-10-21T09:54:01.543 app[6b06793b] maa [info] sentinel | created by github.com/superfly/leadership.(*Candidate).RunForElection

2022-10-21T09:54:01.543 app[6b06793b] maa [info] sentinel | /go/pkg/mod/github.com/superfly/leadership@v0.2.1/candidate.go:60 +0xc5

2022-10-21T09:54:01.544 app[6b06793b] maa [info] sentinel | exit status 2

2022-10-21T09:54:01.544 app[6b06793b] maa [info] sentinel | restarting in 3s [attempt 3]

2022-10-21T09:54:04.547 app[6b06793b] maa [info] sentinel | Running...

2022-10-21T09:54:05.085 app[a00f4f14] maa [info] sentinel | panic: close of closed channel

2022-10-21T09:54:05.085 app[a00f4f14] maa [info] sentinel |

2022-10-21T09:54:05.085 app[a00f4f14] maa [info] sentinel | goroutine 3344 [running]:

2022-10-21T09:54:05.085 app[a00f4f14] maa [info] sentinel | github.com/superfly/leadership.(*Candidate).initLock(0xc0000afe30)

2022-10-21T09:54:05.085 app[a00f4f14] maa [info] sentinel | /go/pkg/mod/github.com/superfly/leadership@v0.2.1/candidate.go:98 +0x2e

2022-10-21T09:54:05.085 app[a00f4f14] maa [info] sentinel | github.com/superfly/leadership.(*Candidate).campaign(0xc0000afe30)

2022-10-21T09:54:05.085 app[a00f4f14] maa [info] sentinel | /go/pkg/mod/github.com/superfly/leadership@v0.2.1/candidate.go:124 +0xc6

2022-10-21T09:54:05.085 app[a00f4f14] maa [info] sentinel | created by github.com/superfly/leadership.(*Candidate).RunForElection

2022-10-21T09:54:05.085 app[a00f4f14] maa [info] sentinel | /go/pkg/mod/github.com/superfly/leadership@v0.2.1/candidate.go:60 +0xc5

2022-10-21T09:54:05.086 app[a00f4f14] maa [info] sentinel | exit status 2

2022-10-21T09:54:05.086 app[a00f4f14] maa [info] sentinel | restarting in 3s [attempt 43]

2022-10-21T09:54:08.086 app[a00f4f14] maa [info] sentinel | Running...

2022-10-21T10:00:26.075 app[a00f4f14] maa [info] sentinel | 2022-10-21T10:00:26.075Z ERROR cmd/sentinel.go:1889 cannot update sentinel info {"error": "Unexpected response code: 500 (rpc error making call: leadership lost while committing log)"}

2022-10-21T10:00:26.142 app[a00f4f14] maa [info] keeper | 2022-10-21T10:00:26.142Z ERROR cmd/keeper.go:1041 error retrieving cluster data {"error": "Unexpected response code: 500"}

2022-10-21T10:00:26.310 app[6b06793b] maa [info] sentinel | 2022-10-21T10:00:26.309Z ERROR cmd/sentinel.go:1889 cannot update sentinel info {"error": "Unexpected response code: 500 (rpc error making call: leadership lost while committing log)"}

2022-10-21T10:58:37.065 app[6b06793b] maa [info] sentinel | 2022-10-21T10:58:37.065Z ERROR cmd/sentinel.go:1889 cannot update sentinel info {"error": "Unexpected response code: 500 (rpc error making call: leadership lost while committing log)"}

2022-10-21T10:58:37.109 app[a00f4f14] maa [info] sentinel | 2022-10-21T10:58:37.108Z ERROR cmd/sentinel.go:1852 error retrieving cluster data {"error": "Unexpected response code: 500"}

2022-10-21T10:58:37.131 app[a00f4f14] maa [info] keeper | 2022-10-21T10:58:37.123Z ERROR cmd/keeper.go:1041 error retrieving cluster data {"error": "Unexpected response code: 500"}

2022-10-21T10:58:44.424 app[6b06793b] maa [info] keeper | 2022-10-21T10:58:44.422Z ERROR cmd/keeper.go:870 failed to update keeper info {"error": "Unexpected response code: 500 (No cluster leader)"}

2022-10-21T11:00:02.267 app[6b06793b] maa [info] sentinel | 2022-10-21T11:00:02.266Z WARN cmd/sentinel.go:276 no keeper info available {"db": "59b64c9d", "keeper": "1449234282"}

2022-10-21T11:05:58.552 app[6b06793b] maa [info] sentinel | 2022-10-21T11:05:58.552Z WARN cmd/sentinel.go:276 no keeper info available {"db": "250e7205", "keeper": "14bf234272"}

This is what I’m getting on my app,

2022-10-21T11:23:03Z app[482108dc] maa [info][028dd3f7-81af-49fb-a72c-94ff4094c224] ActiveRecord::ConnectionNotEstablished (server closed the connection unexpectedly
2022-10-21T11:23:03Z app[482108dc] maa [info]	This probably means the server terminated abnormally
2022-10-21T11:23:03Z app[482108dc] maa [info]	before or while processing the request.
2022-10-21T11:23:03Z app[482108dc] maa [info]):

Interesting, it seems that the there’s an issue retrieving cluster data. I wonder if you can issue a failover and try again? Here’s the instructions on how to do that—particularly step 2 is what you can try—What is the correct process to change the postgres leader region? - #2 by shaun

Thanks its working now

Awesome, glad to hear it!

@shortdiv Still it causing the same problem

Still happens intermittently for me as well.

Same for us around just before 09:00 CEST today. Replication Lag going from 2xx to 3xxxxx during the night and drops to 2xx again around 08:45.

2 Likes

Same for me, details here: Postgres database apps are crashing again - #23 by julianrubisch

(Sorry for crossposting)

We still have issues with this. Our dev db has been dead for nu reasons in 1 hour now and our prod db can’t connect to cluster and keeps restarting.

Same for me. Pls halp.
upd: ok, i’ve restarted it a couple of times and now it works. It was down for almost 1 hour and there was some dns-related issues in the logs.

Please post any related logs that you may have.

I’ve backfilled a status page incident here: Fly.io Status - FRA Consul Server unavailable

If you are still having trouble with your postgres database, please reach out.