What language/framework do you have in your machine? Just wondering whether there could be some issues in your app.
Have you tried shelling in and timing some local HTTP calls? What kinds of timings do you get if you spin up a blank machine in the same region and in the same app, and access your app over private networking?
Sure, but there’s external factors within your control that could cause this; a badly-tuned database is one. I do agree that it is probably Fly networking, but it’s worth doing as much investigation as you can.
I fetch data from the db (not hosted by fly.io) quite frequently and additionally use cache.
If it was the DB, I would notice it from the logs. That is, the time difference between fetching and receiving data from the db would show just from the logs themselves.
This graph shows the response time of your app as seen by the proxy running on the worker host. So communication between the proxy and the machine happens over local interface, no network involved. Full response time that includes the time it took for us to route the request from the edge to the worker is shown on the “Fly Edge” dashboard.
With that information, I don’t think it’s a routing issue, unless you send large requests with body that could’ve became slower if it was indeed a routing issue.
Does you app need to talk to external resources while serving a request?
What kind of requests does your app handle - do they have body or not?
Do you have any kind of tracing to check what takes the app so long to respond?
I just tried curling one of your machines directly from the host where it’s located, and it sometimes takes several seconds to respond.
Could you try doing the same from inside the machine? I checked on 2871259f6e4138 using /api/workers/health/ endpoint.
If you can reproduce it from inside the machine, it might be something with the app itself. The host the machine is running on has plenty of available resources, so I don’t think it’s a noisy neighbor problem. You mentioned the app accesses a DB outside of fly. Does it need to do so when serving the endpoint that I tried? Does it need to talk to other machines in the app while serving the endpoint? Or is the request served based on the information available in the machine and so doesn’t involve any network calls at all?
Ideally, you’ll need to instrument the app to see which part of the app takes so long to respond, otherwise it’s kinda hard to guess where a problem might be.
This remains a poor rule of thumb, as I’ve already mentioned. Something may have changed that is under your control. I’m not having a go at you, but I’d want to help by advocating the right investigation practices.
In light of Pavel’s remarks, I’d suggest baking into your build an extremely trivial listener that operates on a different port. They basically should reply with hello in HTTP plaintext, with as simple a stack as you can muster. Definitely no Django. I should think Python has a simple built-in web server, or has a library for the same.