MongoDB high latency on first request

Hello :wave:,

I’ve deployed an app which exposes an endpoint that pings a MongoDB Atlas serverless instance and returns the latency in the response.

I am facing a problem where the first request I make every ~24h returns a much higher latency than the subsequent requests. This is negatively impacting the website that uses the API because when opening the page it takes a while to display the data.

Here’s an example of the responses

First request:

{
    "ended": "2024-08-03T06:32:35.30184582Z",
    "response_time_ms": 5170,
    "started": "2024-08-03T06:32:30.131418745Z"
}

Second request:

{
    "ended": "2024-08-03T06:42:12.346818877Z",
    "response_time_ms": 134,
    "started": "2024-08-03T06:42:12.212209178Z"
}

To rule out the problem being from MongoDB Atlas itself, I’ve captured the connect + ping latency running from my local machine after 24h and got:

2024/08/03 07:31:44 connected to mongo in 23.619417ms. [2024-08-03T07:31:44+01:00 -> 2024-08-03T07:31:44+01:00]
2024/08/03 07:31:45 pinged mongo in 619.872125ms. [2024-08-03T07:31:44+01:00 -> 2024-08-03T07:31:45+01:00]

The app configuration has auto_start_machines = false, auto_stop_machines = false and min_machines_running = 1, so I wouldn’t expect any cold starts.

Appreciate if you had any idea of what could be causing this. TIA

Hi,

If it’s only the first request that’s slow, that would suggest there is a cold start. Perhaps the machine is being started on-demand :thinking:

It seems from the docs like the min machines applies to the primary region, and you also need the rest of the options e.g auto_stop_machines` set too:

To keep one or more Machines running all the time in your primary region, set min_machines_running to 1 or higher. min_machines_running has no effect unless you set auto_stop_machines to "stop" or "suspend".

min_machines_running only applies to Machines running in your primary region. If min_machines_running = 1 and there’s no traffic to your app, then Fly Proxy will stop Machines until eventually there is only one Machine running in your primary region.

If the settings exactly match those in the docs, perhaps that’s not it but that would be the first thing I’d check.

Hello,

Thank you for replying

I’ve changed the config from

auto_stop_machines = false
auto_start_machines = false
min_machines_running = 1

to

auto_stop_machines = "off"
auto_start_machines = false

I am not convinced it will make much difference, but lets see

1 Like

Look at you app’s Metrics view in the Fly dashboard, then go to the memory usage and open Grafana to see if your app is sleeping prior to the cold start. Use "suspend" instead of "off"

Hi @khuezy,

Look at you app’s Metrics view in the Fly dashboard, then go to the memory usage and open Grafana to see if your app is sleeping prior to the cold start

I see a continuous metric for total and usage memory for the last 2+ days.

Use "suspend" instead of "off"

I want the app to be continuously running and reading the documentation I understand there are two ways of doing it:

One:

If you need all your app’s Machines to run continuously, then you can set auto_stop_machines to "off" and auto_start_machines to false .

Two:

If you only need a certain number of your app’s Machines to run continuously, then you can set auto_stop_machines to "suspend" or "stop" and min_machines_running to 1 or higher. Note that min_machines_running only applies to your app’s primary region.

The first one is what I am doing. This makes me believe the original problem has nothing to do with this configuration :confused:

So your app isn’t sleeping, which is expected based off your config. So it could be Mongodb Atlas Serverless? Does that have coldstarts? Googling said it has had bad cold starts last year but no major updates recently.
600ms+ to ping Atlas seems really slow.

Hi,

I am providing a new update on this.

I’ve swapped the MongoDB Atlas instance from serverless to a shared M0 one and I still experience the problem. The first request took 5s, second 127ms and subsequent ones take about ~13ms.

Back to square zero :confused:

:person_shrugging: could be a problem w/ fly’s networking. Or maybe the mongodb connection pool idled and got disconnected after 10 minutes?

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.