Just to make sure I understand: you’re seeing swap usage correlate with the performance complaints? If you have any other metrics or logs that show the performance drop in detail, that would help us nail this down. The time correlation is a strong signal, but more context is always better.
At first glance, the swap usage seems to grow in periodic bursts of a few seconds (roughly every hour). We run maintenance jobs on the hosts periodically, so it could be related. I went ahead and adjusted the schedule on the host running your machine to make them happen less often, just in case that’s related.
That said, I wouldn’t necessarily expect a slowdown just because pages are swapped in for a brief moment. As long as there is Available memory, a swapped page should be paged back into physical RAM the moment it’s read. Since keeping a copy in swap is ‘free’, high usage isn’t automatically an indicator of memory pressure.
I’m specifically wondering about these two questions:
Why is the swap growing at all during these short bursts and
Why would your app experience a slowdown during windows where swap usage is stable and there is no memory pressure
#1 That’s my question to you.. Again, let me re-emphasize that this happened to every single machine across different docker image, different app, different language, different process group, different cpu group (perf vs shared) and at the same time. The only thing common about these machines is that all these machines were in one org account in one region NRT. So I highly doubt this has anything to do with my code or particular docker image.
#2. I’m not sure if I understand the question. Are you asking why using swap would make my app slower compared to using RAM? That particular app mentioned in the original post was affected more significantly because it runs libSQL sqld for database, not sure if this answers your question and I don’t think this is too relevant because again, the memory reallocation happened to every single machine in my org in a region.
blow is p90/p95 of write operation to libsql on one of the affect apps during the affected time.
My enquiry isn’t about the performance hit. If my app starts to use swap over memory it will be slower. My enquiry is why all of the machines would suddenly decide to use swaps when there are plenty memory left. total memory usage didn’t grow. it just reallocated to swap.
I just checked and it is still happening even after restart.
Found out that all of these machines were (still are) using FROM denoland/deno:2.6.4 AS base as base image, so they were all on the same docker image.
I apologise for the confusion. Much more likely it has to do with the application than fly machines.
Seems like the behaviour is back to normal since February 14th.