Persistent Postgrex Protocol/Connection Errors on Managed Postgres with PgBouncer

Hi everyone,

I’m running a Phoenix app with a managed Postgres cluster, and I keep hitting recurring connection errors like these:

[error] Postgrex.Protocol (#PID<0.2115.0>) failed to connect: 
** (Postgrex.Error) FATAL 08P01 (protocol_violation) server login has been failing, cached error: server conn crashed? (server_login_retry)

[error] Postgrex.Protocol (#PID<0.2115.0>) disconnected: 
** (DBConnection.ConnectionError) tcp recv (idle): timeout

At first I thought this was an app-side issue, but now I see that Fly.io Postgres uses PgBouncer by default for connection pooling. From what I’ve read in the Postgrex docs, this can conflict with named prepared statements unless I explicitly set prepare: :unnamed.

Questions:

  1. Are there any recommended Ecto/Postgrex settings when using Fly’s PgBouncer (timeouts, pool sizes, etc.)?
  2. Could the errors I’m seeing be directly related to PgBouncer (e.g. transaction pooling vs session pooling)? BTW I saw this section in the docs Configuring Connection Pool Mode but I wasn’t able to find those links in the dashboard.

For context:

  • DB is on the Basic plan (1GB RAM, shared CPU).

  • I’m running a mix of web + background jobs (Oban) - very light ( just a POC for my company to migrate from AWS to Fly)

  • App is just 1 machine for now, and I have one repo for oban and one for api so it shouldn’t be hitting pool limits or connection limits.

Here’s my ecto repo config:

  maybe_ipv6 = if System.get_env("ECTO_IPV6") in ~w(true 1), do: [:inet6], else: []

  config :data_pipeline_cron_poc, DataPipelineCron.Repo,
    url: database_url,
    pool_size: String.to_integer(System.get_env("POOL_SIZE") || "12"),
    timeout: 15000,
    socket_options: maybe_ipv6,
    stacktrace: true,
    prepare: :unnamed

  config :data_pipeline_cron_poc, DataPipelineCron.ObanRepo,
    url: database_url,
    pool_size: String.to_integer(System.get_env("OBAN_POOL_SIZE") || "12"),
    timeout: 15000,
    socket_options: maybe_ipv6,
    stacktrace: true,
    prepare: :unnamed

ECTO_IPV6 is properly set to true.

Any advice on how to properly configure Phoenix/Oban/Ecto with Fly’s managed Postgres/PgBouncer would be super helpful!

Thanks :folded_hands:

Hey @willo — thanks for giving MPG a spin. If you’re using the PGBouncer pooled connection, Ecto wants the “transaction” pool mode set, and not the default “session”.

This is mentioned in the docs here, though we’re hoping to roll out some framework-specific docs for MPG soon that will cover this sort of thing.

A reminder, too, that MPG comes bundled with PG-specific support from our crack team of engineers. Just hit the Support tab in the dashboard and raise a ticket from the portal.

2 Likes

Thanks! Even after switching PgBouncer to transaction mode (as recommended above), I was still hitting the same tcp recv (idle) errors, so I reached out to Fly support and was able to resolve the issue.

For anyone else running into the same problem, here’s what worked for me:

  • Prepared statements: When using Ecto with PgBouncer in transaction mode, you need to set prepare: :unnamed. This avoids protocol and idle disconnect errors (per the docs).

  • Pool sizes: “On the Basic plan, keep pool sizes modest — for example, ~8 for the API repo and ~6 for Oban. Oversized pools tend to sit idle, which can trigger the tcp recv (idle) timeouts”

  • Oban + LISTEN/NOTIFY: PgBouncer in transaction mode doesn’t support Postgres LISTEN/NOTIFY (related issue here). “For Oban, you’ll need either: The Repeater plugin, or A small direct Postgres connection just for Oban’s notifier, while leaving the rest behind PgBouncer.”

I went with the Repeater plugin and reduced pool sizes, and since then connections have been stable. Big thanks to the Fly team for the guidance

3 Likes

^^ This should be added to the docs, I was going back and forth with support and this is the information that I needed. Thanks @willo !

1 Like

The latest version of Oban doesn’t appear to require the Repeater plugin. Adding it results in a warning logged instructing the removal of the plugin.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.