Very slow international traffic to ORD from Europe

danwetherald · May 31, 2022, 9:19pm

Hello everyone,

We have since day one experienced very slow, basically unusable response times when our customers in Europe try to access our apps that are all hosted in ORD.

These are pretty straightforward graphql api’s written with node/prisma just for some background.

When we hosted over at Heroku for instance, this would still remain speedy with very little increase in response times, is there anything we are doing wrong? It feels like there is routing issues possibly?

Thanks in advance!

kurt · May 31, 2022, 9:22pm

Can you give us an example request? Also, do you know where in Europe?

One easy way to test this is to launch another app in Europe, SSH to it, and start using your own API. Europe <-> US should actually be faster than Heroku because we do TLS termination in Europe.

danwetherald · May 31, 2022, 9:31pm

Hey @kurt

Thanks for such a quick reply! Let me gather a test request for you (most are behind auth), might be best to DM that over to ya

We honestly see this issues almost everywhere outside the US, nowhere in particular, really anywhere overseas.

I will spin up a generic image over there in a few places and see if we see anything out of the norm.

Some other context, when we first started deploying on fly a while back, we had our PG server in ORD and a few US region api servers spread out and we also saw really bad performance, but my guess was that was caused by Prisma making a ton of requests to DB on the api server, so a TON of region to region requests piled up.

From my understanding now that everything is in ORD, we should only have the initial trip to API server per requests and one more back with a small amount of delay when a user hits our API from Europe?

Thanks!

danwetherald · May 31, 2022, 10:01pm

Something that was odd, I was actually seeing some slow requests hanging in US as well, but after an app restart command, we are now running smoothly in the US as well.

So we do have a URL that we use as a health check, but this does not interact with the DB, etc. like a typical response would do and this is also responding very quickly now from a FRA instance via fly ssh (~100ms).

Is there anything that would cause something to be “fixed” upon a app restart at the fly level?

kurt · May 31, 2022, 10:07pm

Not at the fly level, no. Well, probably no. In theory there’s state in our proxies that could cause something like this, but we don’t see anything similar happening on other apps.

I have had this happen on Node apps before. If they’ve been running for days and grind to a halt, it might be a memory or event loop leak. If the health check itself slowed down, it’s probably something in the Node process. This could be exacerbated over long distances just because it is slower for packets to round trip.

danwetherald · May 31, 2022, 10:10pm

That makes a ton of sense, I appreciate the explanation.

I will let you know if we continue to see issues, would you happen to have any suggestions on keeping the apps “fresh” by some kind of cron process to restart the apps in the case that new deployments are not happening?

Topic		Replies	Views
Slow response times since yesterday Questions / Help	8	874	October 27, 2023
Very slow app response times Questions / Help machines	6	146	November 2, 2024
Network issues in FRA and AMS Questions / Help	18	1138	February 10, 2022
Is Fly.io slower today?	26	931	September 1, 2022
Strange P99 metrics	12	548	April 28, 2023

Very slow international traffic to ORD from Europe

Related topics