PG 53300 error "remaining connection slots reserved" randomly appearing at deploy

Hi,

We use GitHub Actions to manage the CI of our Phoenix 1.5.13 app.

It’s all fine and all, but sometimes, randomly, the deploy process runs into this error toward the end:

FATAL 53300 (too_many_connections) remaining connection slots are reserved for non-replication superuser connections

I cannot find anything related in Fly documentation. I tried to connect to our prod DB (SELECT usename, application_name, state, query FROM pg_stat_activity;), and it seemed that a lot of PG connexions were taken already: 5 from cloudsqladmin, and 25 from postgres. Most of these connexions are idle, though. Besides, that number is much higher than pool_size in our runtime.exs config file, which is set at 10.

So my questions:

  • Has anyone encountered the same problem? Might it be GitHub-related?
  • Are pool_size and the max number of PG connexions related? If not, is there a place to configure it?
  • Should we use a Postgres cluster for our DB? I am looking for the pros and cons of using Fly’s native PG cluster but I can’t find anything convincing…

Thanks!

@fdeage that errors seems to pretty clearly indicate that the deploy process is unable to get a Postgres connection to do the migrations with — don’t think it’s related in any way to Github Actions.

If you already have a large number of connections open, you’d want to see where they’re open and why, in the application itself.

pool_size would be the number of connections each VMs takes out, so you’d expect to have that multiplied by the number of instances you’re running as the baseline connection count.

There’s a lot of information here: How to setup and use PGBouncer with Fly Postgres — most of it is about pgBouncer, but the same details apply whether you decide to use a bouncer or not.

2 Likes

Hi @sudhir.j, thanks a lot for your reply!

After reading your reply, three questions come to my mind:

  • Isn’t the previous app (and its DB connexions) stopped when we deploy the new one? How come these connexions are still open?
  • We only have one shared-cpu-1x with 1GB memory. But as I said, there are 30 connexions taken, and our pool_size was set to 15. So I don’t understand how the deploy can work in some cases
  • As for PGBouncer, isn’t it useful only when you have several apps?

Not quite. The release phase runs before the new image is rolled out to your VMs. This is usually necessary because when you’re running a database migration on the release phase you want all new instances to start with the new migrated schema — the code you’re deploying will likely assume the new schema as well. Wouldn’t want the new instances started before the release phase to error out.

We only have one shared-cpu-1x with 1GB memory. But as I said, there are 30 connexions taken, and our pool_size was set to 15.

I’m wondering if the VM that the release task runs on itself grabbed 15 connections — it’s also just another instance of your app. Not sure, though, would like to page @brainlid for a consult :smiley:

As for PGBouncer, isn’t it useful only when you have several apps?

I’d say it’s useful when your application is taking out connections that it’s not actively using. Your app has 15 connections checked out, for instance, but it actually using all 15 all the time? If the answer is no, pgBouncer will help by giving your app 15 fake connections and mapping then on to real ones only your app actually tries to use them. On a good day, pgBouncer could map all 15 fakes on to a single real Postgres connection, depending on your application’s behaviour.

1 Like

Thanks again! So the previous app and the new one have to share connexions during deploy.

Then, what would be the inconvenience of setting a very high pool_size value, like 200?

I’ll go give PGBouncer a try :slight_smile:

Depends on the behaviour of that particular connection pool system. If it eagerly sets up 200 connections at start up, that would be a waste of resources and probably crash your Postgres DB. If it treated 200 as a max limit and aggressively release connections it wasn’t using, that would be fine. But I think your app is doing the former.

If you had pgBouncer running it wouldn’t make much of a difference, since the 200 connections would all be fake your database isn’t being subjected to much pressure by default.

1 Like

In this particular case, though, you’ll probably want to just reduce the pool_size (if the app isn’t actually doing continuous work with 15 connections), or increase the DB RAM allocation. pgBouncer is an extra layer of complexity that’s not always necessary.

Hello @fdeage!

I just want to confirm what @sudhir.j is saying and add a bit to it.

The setting for pool_size tells your Ecto.Repo how many DB connections to keep open. The default is 10. Setting the value higher will open more active TCP connections to PG, which is not what you actually want.

The more instances you add, the more open DB connections.

Also, as @sudhir.j said earlier, when the app is deployed, a new instance of the app is started (in addition to what is already running). This new app runs the DB migrations. After those are successful, the new app is started up and starts receiving incoming traffic. This is in addition to all your currently running applications. So if I deploy 3 apps with a pool_size of 10, I’ll have 30 open connections. When I deploy, an additional set will be opened for the migrations. After that, a new app instance is started (now at 40 connections). Once it is considered “healthy”, an older app instance is stopped. Dropping to 30 connections. Then it rolls out like this for the rest of my instances. A new instance is started, waits to be healthy, then the old gets shut down. So it will jump between 30-40 connections as it rolls out. This is called a “rolling deploy”.

The PGBouncer is a good option.

I touched on this problem at Elixir Conf (ElixirConf 2021 - Mark Ericksen - Globally Distributed Elixir Apps on Fly.io - YouTube) and offerered an alternative approach. It uses a fly_postgres library, multiple regions, and postgres replicas.

It ends up looking more like this:

Your project’s needs will dictate the best approach for your situation.

Another simple option is to consider how many open DB connections you actually expect/need. It may be more like 5 per app. With a lower pool size, the problem might be solved. If you really do need the connections, you might need to scale up your PG server to give it more resources.

Hope that still helps!

2 Likes

Hi @brainlid, thanks a lot for your reply.

Where could I see the limits of total connections I can “afford” on my instance? Does Postgres blocks new connections because it sees that it lacks resources to do so? I thought it was a hardcoded number somewhere…

Also, since we don’t use a Fly Postgres instance, shouldn’t it mean that the DB is running on the same machine, without needing a TCP connexion?

The limit of max connections is a configuration on Postgres, but there are some considerations to take into account — this stack overflow answer: postgresql - How to increase the max connections in postgres? - Stack Overflow talks about what family of parameters to set.

Just need to remember that the connection count is also limited by RAM, so if you run out of RAM changing the settings won’t help much.

The actual config files are accessible here: Postgres DB connection limits - #2 by jerome

1 Like

Unless you deployed Postgres as part of your Docker container project yourself (definitely not the default… that more like a Docker Compose setup), then your app and the db are running in separate instances that may or may not be on the same physical machine. However, all DB connections will be through the TCP interface.

1 Like