Hello everyone ,
We’ve been experiencing network issues in the LHR region over the past couple of weeks and I’m curious if anyone else has encountered similar troubles?
Our situation involves unexpected CPU spikes on a product with modest load, primarily used by our internal team. These spikes are quite irregular and significantly deviate from our expected usage patterns.
During these periods, both nodes struggle with socket operations. They fail to read/write, leading to dropped existing connections (e.g., to the DB) and timeout errors on outbound requests (like ConfigCat cache updates).
Any insights or shared experiences would be greatly appreciated!
My shared experience is probably not relevant, but…
I have an app in LHR that has seen some odd CPU usage since upgrading to v2 machine based app. Every now and again it pegs the CPU at 100% until I restart the machine. Ran for ages on v1 app without doing this.
It may well just be a problem in my code but odd it never used to do it and I haven’t changed the code for ages.
We noticed that one of the cores on one of the instances is fully loaded and investigating that now.
Where did it start for you?
If (TBC) you’re on a LHR host which is having resource problems, possibly triggering an app issue related to unexpected (e.g. IO issues) behaviours (subsequently causing your app to have a funny turn and peg the CPU), you’re always going to have the problem on that host until the underlying issue is resolved.
AFAIK one key difference between V1 and V2 is that with V2/machines - any given machine is tied to a specific host. With V1 if you redeployed you could end up on another host (“fixing” host-specific problems) - with V2/machines this isn’t the case.
I’ve had a few host-specific issues in the past, threads that may be of interest:
Possible network problem (periodic spiking latency) in LHR/on host 81b8?
Feature Request: FLY_HOST environment variable
Your issue(s) may indeed be V2-specific and/or app-specific, and nothing to do with the host(s) - but it could be worth considering/investigating.
Yes, I’ve been experiencing the same issue.
I’ve got a single app in LHR and since yesterday deployments via Github Actions have been failing with “Error: failed to update VM xxxxxxxxxxxxxx: aborted: could not reserve resource for machine: insufficient memory available to fulfill request”
Did you also observe one of the cores maxing out? Could you please describe your symptoms exactly
thanks, that makes sense.
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.