Redundancy and Scaling App and DB

Hi all,

I’m looking for sone best practice around making sure my services are always running.

I have a staging app and a production app, using Phoenix.

From yesterday I’ve been having intermittent errors when deploying to staging (via GitHub actions or manually):

(DBConnection.ConnectionError) connection not available and request was dropped from queue after 2977ms. This means requests are coming in and your connection pool cannot serve them fast enough

To try and fix this, I believe I scaled my database to a single user using scale (the activity log in the dashboard for the database says “scale” in there) but mind you when I ran that command several times it “did not recognise … name”, even though its the only visible name for the db that I can find (eg via fly apps list in the console), and I cannot find anywhere else to check the scale, “fly scale show” in the console shows the app not the db.

The replication lag just before one of these failed deployments was 20000 whatever that means - and then dropped to 0, but there are 1 or 2 people max using this staging service, If the DB is still multi user it would make sense, it must be a simple thing to be able to check the database scaling?

Anyway none of this would worry me too much - I set up the production with a more powerful everything 8GB dedicated cpu 4x (including db - but then how would I check?) - but then it fell over with some strange errors

syd [error] error.code=2002 error.message=“App connection problem” request.method=“GET” request.url="https://app. my_domain .com/company" request.id=“01FBTQTVRQ4JNSRKGEPF97KVZB” response.status=502

Given that error - I would like to check staging but that’s falling over because of the database connection error.

So I guess my query falls into two areas:

  1. How do I see my database scale, why am I getting so many of these connection errors since yesterday?
  2. How do I create redundancy? Do I create another app which replicates production and then quickly point my CNAME at it if a deployment to the “real” production app fails with error codes?
  3. For a Phoenix LiveView app is 8GB enough? The app is quite mundane really - and it seems quick usually.

Thanks for your help,

John

Hi John,

AFAIK Fly CLI by default will show you information of the app found in the nearest fly.toml file.

You can specify which app you are referring by using the --app <appname> option.

Hi charsleysa,
thanks for your quick reply.
Great,
fly scale show --app mydbname
does show some useful info (once I click pass - “it does not match app name…”)
Thanks!
So it looks like it is a dedicated cpu 1x and 2GB of RAM - in your experience is this too small for a small number of people to use (ie for staging) ? Is that why DB connection errors might be happening?
Cheers
John

Hi John,

I’m not familiar with Phoenix so I can’t tell you what size node to use but I’d recommend testing with a few different sizes to see how they respond to your workloads.

I’d also recommend checking that you’re using things like database connection reuse and request caching.

Are you getting this error when you run migrations, or after a VM is booted?

If I understand correctly:

  1. You’re getting the above error from Phoenix, which is saying it can’t get a connection from the DB connection pool fast enough to do what it needs
  2. You’ve scaled both the app and the DB to 8GB of RAM
  3. You also got that error 2002 message.

The first things to check are your Phoenix connection pool size and total number of connections to Postgres. Did you happen to look at the connection graph on our UI when you checked the replication lag?

You almost definitely don’t need 8GB of RAM for a liveview app, especially when there’s not much load.

Redundancy is built in. You can run fly scale count 2 on your Phoenix app and it’ll run two VMs full time. The Postgres cluster should already have two nodes.

I just checked your Elixir app and I think one problem is that it was booting VMs in backup regions. Your DB is running in hkg, and that 502 error was trying to use an app VM in Singapore.

I updated your app to only use hkg. It’s not very intuitive so there’s almost no way you would have guessed this, but the command is: fly regions backup hkg

Hi Kurt,
Thanks so much for getting back to me.

Thanks also for the great podcast with the Thinking Elixir guys - really fired me up about fly.io

To answer your questions:

john:

(DBConnection.ConnectionError) connection not available and request was dropped from queue after 2977ms. This means requests are coming in and your connection pool cannot serve them fast enough

Are you getting this error when you run migrations, or after a VM is booted?

This is happening I believe with migrations - on the smaller app (1x dedicated 2GB app, with 2GB dedicated DB), not on the larger.

If I understand correctly:

  1. You’re getting the above error from Phoenix, which is saying it can’t get a connection from the DB connection pool fast enough to do what it needs
  2. You’ve scaled both the app and the DB to 8GB of RAM
  3. You also got that error 2002 message.

So no the DB on the larger app (the app 8GB, the DB is 4GB) has not suffered any connection problems, 2002 error message however happened on the larger app (8GB app, 4GB DB).

The first things to check are your Phoenix connection pool size and total number of connections to Postgres. Did you happen to look at the connection graph on our UI when you checked the replication lag?

I didn’t do that - next time!

Redundancy is built in. You can run fly scale count 2 on your Phoenix app and it’ll run two VMs full time. The Postgres cluster should already have two nodes.

Fantastic good to know!

I just checked your Elixir app and I think one problem is that it was booting VMs in backup regions. Your DB is running in hkg, and that 502 error was trying to use an app VM in Singapore.

I updated your app to only use hkg. It’s not very intuitive so there’s almost no way you would have guessed this, but the command is: fly regions backup hkg

Thanks for your help Kurt! Which app was it do you know? The small (both db and app on dedicated 2GB) or the large (8GB app, 4GB DB)?

Thanks again,
John

I just did that on the larger app, I didn’t mess with the smaller one. The DB apps both appeared to be running in hkg already. You can check with fly regions list -a <appname>, and then replicate what I did with fly regions backup hkg.

I’m 99% sure you can scale your apps back down to smaller RAM / CPU amounts. The errors you’re seeing shouldn’t have anything to do with RAM unless they’re very busy and you’re somehow using up a ton of DB connections.

Hi again Kurt,
Thanks so much for that additional info Kurt.
Cheers
John