How to setup and use PGBouncer with Fly Postgres

Looking for best practices on getting PGBouncer deployed on Fly so that we can scale up connection limits for all our prisma clients across about 10 different redundant fly apps.

We have not setup pgbouncer before, so would love to know how to best accomplish this on fly!

I am not really sure what configurations should be when using fly postgres per vm sizes, etc. So any insights here would also be greatly appreciated.

I should also add, that by default there is a connection_limit set for prisma at 3 so we started getting some errors recently that state there are no available connections, so we went ahead and increased this to 10 today just to see what would happen, but not sure how far we can push this, how far we “need” to push this, and when if needed is pgbouncer required to handle these connection limits safely.

Thanks in advance!

Heh, I love this topic so much I spent weeks writing a post on it :smiley: Understanding Connections & Pools

The bad news is there’s no global correct answer, it depends on your app and load, but the good news is that’s there’s some rules of thumb you can use.

Each connection takes about 15MB in your instance’s RAM, give or take a few MB. The RAM also holds a lot of your dataset, preferably the entire thing or at least all the indexes. I love PgHero: GitHub - ankane/pghero: A performance dashboard for Postgres because it surfaces this info really well, and it’s easily deployable on Fly using Docker. So this tells you how many connections you can reasonably support on your PG instances: (RAM - data/index size - baseline OS/PG) / 15.

When it comes to PGBouncer, the number above is what you’d want to configure as max_db_connections — crucially, if you have multiple instances of PGBouncer, you’ll need to make sure the max_db_connections across all of them is less than or equal to the total your database can support.

You can install pgBouncer as a container in its own Fly app, there are few Docker distributions, but I would just choose the Bitnami one: Docker Hub

If you run this you can then configure your connection pool on the app to be pretty high: pgBouncer will hand out cheap fake connections like candy, and map them on to a real 15MB connection only when the app is actually trying to run a query. Depending on the nature of the connection pool and the way the application checks out connections, you can often make your app servers think they have 10x to 100x more connections that you actually have.

Watch out for the pool_mode though: this will only work on a more aggressive pool mode than the default. The session mode maps 1 client connection on to 1 real connection, even if the client is not doing anything. This give your almost no gains, but is necessary if you’re doing things like prepared statements or advisory locks without tracking your connections on the client side.

If you can be careful about / disable your prepared statements and advisory locks and other session level operations, you can switch the mode to transaction, which is much better because even if your clients checkout a connection, they’ll get a cheap fake that’s mapped on to the real connection only when a transaction starts, and for the duration of that transaction.

Or you can go all the way to statement level, if you know exactly what you’re doing with this, and works great for reporting systems etc.

You can also adjust all this to your read replicas — if you run a separate set of pgBouncers for the read replicas only, you can then multiply your total max connections by the number of read replicas you have, and you can probably do statement mode on the replica group.

So yeah, once you have pgBouncer set up and configured, you can raise your connection_limit on Prisma pretty high - maybe have it equal to the max number of requests you expect to handle concurrently on that instance.

1 Like

WOW, just WOW.

This is so much more than I was even expecting, you are absolutely amazing @sudhir.j - this is super helpful!

We have already deployed PGHero on Fly (only took 2 mins :raised_hands:t2:) - deploying simple images like these on fly is SO freaking easy!

Once we increased the connection_limit to 10 on the two main apps which run 2-3 vms each at the moment, we did see a slight spike in connection, which made peak connections go from ~20 to ~60, which makes sense, what is interesting, is that RAM usage really has not been effected.

Again, this is a brand new DB with very little data (50mb) and not a ton of traffic yet, but just trying to get an idea on how far these can be pushed so that we have an overall better understanding on how to manage a healthy setup as things start to scale up.

With this known, would this math make sense with a 2GB VM running on the shared 1x CPU:

max_db_connections = (2000 - 50) / 15 = 132

In this case, I still am not confident how high we can push the connection_limit on the prisma apps, is there a point where too high decreases performance? How would we go about tuning this number?

Thanks again, what an amazing response!

The 132 connection thing sounds fine, but the more connections you have the more CPU you’ll burn thrashing between them etc, so generally the lower the better. DBs do better with a lot of room to breathe.

connection_limit on the prisma side is entirely up to you — I’ve had situations where I’ve had to just do something like 100000 to get things going because the app was checking out connections and not releasing them in the internal pool. This is a bug, though, not what you’d want. But I’d say load test with increasing numbers, and either way you won’t / shouldn’t go past the number of concurrent requests - so your Fly concurrency limit is actually a good mirror for your app’s connection_limit.

It’s also pretty safe to just set a high limit — either way the app will only take what it needs, pgBouncer will only give what is asked and aggressively reclaim zombie connections, and then it’ll only make as many real PG connections as it absolutely needs to run actual queries, and aggressively close the ones it’s not using. All the knobs are on the config list, and most defaults are pretty sensible. This all assuming you’ve switched to transaction mode pooling, of course.

1 Like

This makes perfect sense, thank you so much, when it comes to deploying PGBouncer, any suggestions on Fly VM sizes/resources, etc?

Nothing specific, no, I think it’s a very efficient program, and it only proxies data, so can start with the smallest size and work up if necessary.

1 Like

Makes sense.

As for fly.toml - it should be pretty minimal, no dockerfile required, supply image from bitnami, supply ENV variables, set database connection string via secrets and deploy it in the same region as the fly postgres app leader.

Switch apps to use the internal dns to pg bouncer instead of directly to the fly postgres app internal dns.

Am I missing anything?

Nope, that sounds about right. There shouldn’t any custom code or anything for this, it’ll all just ENV variable knobs.

1 Like

Awesome, thanks so much, I will update here once we have everything successfully running!

Thanks again @sudhir.j

1 Like

@sudhir.j - working on getting PGBouncer deployed, but I am getting an error “Permission denied”. I am 100% sure the hostname, username, password, port are correct.

Configuring credentials
2021-11-05T05:35:54.879 app[09c7d986] ord [info] /opt/bitnami/scripts/libpgbouncer.sh: line 203: /opt/bitnami/pgbouncer/conf/userlist.txt: Permission denied
[build]
  image = "bitnami/pgbouncer:latest"

[env]
  POSTGRESQL_USERNAME="db_user"
  POSTGRESQL_DATABASE="db_name"
  POSTGRESQL_HOST="fly-app-postgres.internal"
  POSTGRESQL_PORT="5432"

  PGBOUNCER_DATABASE="$POSTGRESQL_DATABASE"
  PGBOUNCER_POOL_MODE="transaction"
  PGBOUNCER_MAX_DB_CONNECTIONS="120"
  PGBOUNCER_EXTRA_ARGS="--verbose"

POSTGRESQL_PASSWORD has been set via fly secrets.

Any ideas?

This distribution seems to require a config directory mounted: GitHub - bitnami/bitnami-docker-pgbouncer

If you can mounting a volume as described there it might work. Can also try setting PGBOUNCER_AUTH_TYPE to any, which shouldn’t require this config at all, not sure if that’ll stop the check, though.

Quite a while back I used pgbouncer extensively in high parallelism (read: many clients, many servers) environments. I basically did something similar to what @sudhir.j suggested in terms of calculating how many clients you want to allow from pgbouncer to your host (you need to take the rest of your postgresql config into regards as well - there are per-client settings and global settings), then I deployed a pgbouncer instance on each of my app deployments, making the connection from client to pgbouncer persistent and pgbouncer to postgresql persistent. Skipping connecting has its pro’s (faster) and con’s (consider contract of pgbouncer<>app in terms of transactions) - but its incredibly scalable.

This would be harder to achieve in the dynamic world of auto scaled containers but you would always have a “client max” that would align to “containers max” * “client connections”. As for running multiple processes in a container, I think there’s a few examples around the fly ecosystem.

1 Like

I had a feeling there might be an issue that required a volume based on the error, but wasnt sure.

Tried to set this to “any” and it still is showing Permission denied.

Should we look into a different image or should we add the volume. In the future if we scale pg bouncer for all our read replicas, is having a volume nearby a hassle for every region we scale the read replicas into as well as pg bouncer into?

Yeah, good point. I don’t think it makes sense to use volumes on pgbouncer, it does cut out autoscale capabilities quite a bit. How about edoburu/pgbouncer: Docker Hub ? It specifically talks about not needing this file to be present.

1 Like

Ah this is now running! Much easier.

But now I can’t seem to connect to the pg bouncer instance while connected to my orgs wireguard vpn.

I simply changed the hostname to the pg-bouncer-app.internal from the pg-app.internal and tried connecting but get connection refused, something I am missing?

Port and IPv6 binding are the usual culprits here, so you’d want to double check the port and make sure it listens on IPv6 by setting LISTEN_ADDR to [::] or :: depending on how it’s used. It defaults to 0.0.0.0 which I think only responds on IPv4.

1 Like

BOOM :exploding_head:

LISTEN_ADDR="::"

We are up and running :rocket:

Thanks so much @sudhir.j - you have been absolutely stellar.

1 Like

Here is my final fly.toml file for anyone else looking to get PG Bouncer deployed on fly :balloon:

kill_signal = "SIGINT"
kill_timeout = 5
processes = []

[build]
  image = "edoburu/pgbouncer:latest"

[env]
  DB_USER="db_user"
  DB_NAME="db_name"
  DB_HOST="fly-pg-app.internal"
  DB_PORT="5432"
  
  LISTEN_ADDR="::"
  POOL_MODE="transaction"
  MAX_DB_CONNECTIONS="120"
1 Like

@sudhir.j - I may have spoke too soon :frowning: lol

pooler error: pgbouncer cannot connect to server

Any ideas?

Is the password missing? I’d check the logs and make them as verbose as possible, there’s got to be a real reason in there somewhere.

PS. Both logs I mean, both the pgBouncer and Postgres as well.