Flycast for postgres

Hey folks! We have some new fresh produce for you full of some of our favorite things: flycast and postgres!

Tldr

You can switch from DNS based resolution for your postgres instances to flycast based connections by running flyctl pg add_flycast against your machines based postgres cluster. Future flyctl pg attach invocations will use the flycast address instead of the .internal domain. You can manually find your new flycast ip by running flyctl ips list as well.

What is flycast again?

Good question! A flycast ip is a private ipv6 that goes through the proxy instead of directly to instances in your app. Meaning you get all the load balancing, rate-limiting, and other sweet proxy features you know and love.

Why?

  1. It removes DNS from the critical path of PG connections
  2. We get get the safety/service level checks/smart routing from the proxy for free
  3. It opens the gateway for things like scale to zero postgres (flycast will already start up stopped postgres machines)

Going forward

This will become the default for new postgres clusters sometime in the future

5 Likes

Hi there,

Is there any chance that this release/deploy could be causing issues for existing apps using flycast?

We have a service that was just deployed ~15 mins ago that uses flycast for ingress/egress, and currently can’t connect to other apps on the network (hangs, then connection reset)

Rolling back to the previous version didn’t fix it, which is what’s leading me to believe it’s related to the infra.

We’re able to work around it by using the .internal DNS namespace, but this is causing degradation in other ways (occasionally trying to connect to processes that aren’t listening for http connections)

Could someone please look into this? Or direct me to a better forum to report?

1 Like

Seems to be an unrelated network problem that we are aware of. We’re looking into it.

1 Like

This should now be fixed. Can you confirm?

Only a few hosts (new) were affected: Fly.io Status - Flycast connectivity broken from certain (new) hosts

Confirmed that it’s fixed for our app, thanks!

Are post-mortems conducted/publicized for these incidents? This caused ~30 mins of downtime during key hours for us - not good. We were lucky that we eventually found a workaround with the .internal address, but until then we were totally hosed.

1 Like

We don’t have a great system for public-postmortems yet but we frequently share details on the forum and emails, or when people ask!

We are in the process of adding lots of new regions/workers to keep up with demand, which is a good problem to have! In the process of provisioning a new bunch of hosts today some new deploys were ending up relying on those hosts before they were ready. Specifically not all our networking provisioning had happened yet, hence the broken flycast. We manually deployed the proxy on those hosts which fixed flycast routing.

@DAlperin What is possible today for scale to zero postgres?

You’d need to install your own Postgres version that knows how to exit when it’s idle. Our proxy will happily start Postgres up when a new connection comes in, but it won’t every shut down after that.

Thanks @kurt I’ll experiment with possible approaches.

I’ve just tried this on my Postgres V2 cluster (non-flex). The process tried to update my read replica first, and it’s now unhealthy:

Updating machine e784e299c65658
  Waiting for e784e299c65658 to become healthy (started, 1/3)
Machine e784e299c65658 updated successfully!
? This will overwrite existing services you have manually added. Continue? (Y/n) 

Here’s the log

2023-04-26T07:53:05.397 app[e784e299c65658] ams [info] keeper | 2023-04-26T07:53:05.397Z ERROR cmd/keeper.go:719 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2023-04-26T07:53:06.532 app[e784e299c65658] ams [info] exporter | INFO[0699] Established new database connection to "fdaa:0:a7e8:a7b:10d:1e2:f5c1:2:5433". source="postgres_exporter.go:970"
2023-04-26T07:53:06.581 app[e784e299c65658] ams [info] keeper | 2023-04-26T07:53:06.581Z ERROR cmd/keeper.go:1572 cannot move from master role to standby role

flyctl ips list isn’t showing anything.

I don’t want to accidentally take down my db, and it’s not clear whether this is intended behaviour. What should I do? stolonctl isn’t showing any stale / bad keepers.

Is there a way to opt out of this so that I don’t have to rewrite and replace the DATABASE_URL secret? We use Postgres listeners, and the Flycast proxy terminates connections after 60s.

I’m unable to connect to postgres via connection string with flycast. It only works if I replace flycast to internal. What configurations do I need in order to use flycast?
I’m using axum rust backend api with sea-orm and sqlx-postgres, which gives this error:
Error connecting to the database: Conn(SqlxError(Io(Custom { kind: UnexpectedEof, error: "tls handshake eof" })))

Hey @Quintessa, do have the pg_tls handler set like described here: New proxy handler: pg_tls (PostgreSQL sslmode)

Thanks for the reply. I’ve added pg_tls but it still gives error when deploying:

Error:

Error connecting to the database: Conn(SqlxError(Io(Custom { kind: UnexpectedEof, error: "tls handshake eof" })))

fly.toml:

app = "test"
primary_region = "lhr"
kill_signal = "SIGINT"
kill_timeout = "5s"

[env]
  PORT = "8080"
  ENVIRONMENT = "production"

[[services]]
  protocol = "tcp"
  internal_port = 8080

  [[services.ports]]
    port = 80
    handlers = ["http"]

  [[services.ports]]
    port = 443
    handlers = ["tls", "http", "pg_tls"]
  [services.concurrency]
    hard_limit = 25
    soft_limit = 20

  [[services.tcp_checks]]
    interval = "15s"
    timeout = "2s"
    grace_period = "1s"
    restart_limit = 6

Ohh no, I meant that you needed to add “pg_tls” to your Postgres service configuration, not the app connecting to it. But since you are using flycast which goes over our encrypted wireguard network you don’t need tls so make sure you deactivate tls by adding ?sslmode=disable at your connection string’s tail.

"postgres://<username>:<password>@<postgres-app-name>.flycast:5432/<database>?sslmode=disable

try and fail for 3 hours and I finally made it to work with sslmode=disable

Another 0.5 hours more to find this comment to validate my action. Thanks a lot