This VM is running MySQL as described in Use a MySQL Database · Fly Docs nothing else running on this VM. I assigned it 2GB, I thought that would be enough. But during the day the memory kept going up until it grinned to a halt.
After that I panicked, killed it and changed the VM to some dedicated machine.
I wonder how this can happen and how I can prevent it from happening again.
It may be worth looking at what the default
my.cnf file contains. I don’t know what defaults it comes with but MySQL can be a memory hog and if it can use memory, it will. Seems like that guide from Fly recommends 2GB and so I would assume that is the intended amount.
The memory used will change over time as it will be used by various things: logs, buffers, query cache … and the number of connections, as each connection will use some memory. So for example if you have some app connecting to it … but not closing a connection, perhaps those are slowly accumulating over time? Total guess but that could be something to explore. As killing it would end those connections. But then they would re-accumulate.
Percona seem to be experts in database optimisation. Like the first page of this blog post:
Though old, this may be helpful too. Gives a general idea about how memory is used:
Thank you. If that’s what I have to do, I think I will just move the DB off to a managed provider like RDS or DigitalOcean. I have maybe 100 users a day using this app, and it’s mostly just some simple SELECT statements on a small DB. The biggest table is about 30k rows. If I already have to start tweaking mysql parameters, then I am scared of scaling up to 1.000, 10.000 users a day etc.
What’s confusing me though is this:
“Memory Utilization” in Grafana is perfectly constant, what I would expect. It doesn’t even look like “Firecracker Memory Usage” in the Fly dashboard:
Yep, given that described usage and those numbers, the grafana looks correct to me too. The Firecracker graph seems to be the outlier. That will be one for Fly to answer. Probably tomorrow now. So it may not actually be a problem with the database at all.
(All I was thinking about looking at the conf is that I’ve no idea what size of server MySQL assumes it is being installed on. It may assume it is going to be installed on a server with e.g 16GB of RAM and therefore would default to using numbers unsuitable for a 2GB server - which could then easily be solved by dropping the numbers down to a correct level, and problem solved. But at this point that doesn’t seem to be needed)
Checkout PlanetScale or Aurora Serverless v2. Even less of a devops overhead, if that’s what you are after.
sqlite isn’t quite ready yet, while there’s
materialize.com, Cloudflare D1 among others, looking to make in roads into the serverless db market.
Try this. Click the title of the graph in Grafana, that should give you a dropdown with “Explore” on it. You can see and play with the actual queries there. Grafana is using
mode() for that graph, I believe our UI is using
avg(), so they’ll look a little different.
Total memory includes page cache. If you run a query like this, you can see page cache:
The page cache is memory used to speed things up. Linux will keep as much memory as you want in the page cache. We’ve gone back and forth on whether to include that in the memory graph. It’s ok for your app to be using almost all available memory, though.
The Fly.io metrics dashboard is based on
mem_free which includes cached memory in its total, while the grafana dashboard is based on
mem_available which excludes cached memory (in use but still ‘available’ if an application requests it).
Take a look at the detailed memory usage panel on the ‘app instance’ grafana dashboard for a more detailed breakdown which should show the cached amount separately.
Thank you all for taking the time to reply!
I might have mistakenly concluded the DB went down. I saw an error regarding my DB in my app logs and assumed that the DB was down, but it might be some other fault in my app. I concluded that the DB was down because when those errors occurred the memory usage was at like 99% in the Fly dashboard, so I assumed that was the problem.
I’m just going to let it do its thing the next few days and see how it goes
This sounds like you may need to tune my.cnf (which you may need to do even on a managed provider).
It’s been a decade or so, but I think the most important parameters are