Hi there, I’ve come across this article in the docs:
What would a good way to track usage of these machines though? I think that just “logging” the start time is insufficient, because the task the machine runs exits automatically when it becomes idle, so I have no way of knowing when that happens.
I’ve thought of some kind of heartbeat tracking, that checks via the fly machine API every second which machines are running and increments the related usage key in my Redis database by one. That gets to a lot of API requests though, so I don’t know if that is actually recommended.
Basically, I want a redis database to track usage of every machine (which is tied to a user), and every user has a certain amount of usage included every month. I’m fairly certain that I’d have to implement this usage tracking myself. - Just asking here what I should do to keep track of which machines are running.
The second question I have: Is there a way to set a max TTL for a given machine upon launch via the Machines API?
Usage of machines should be easy; just periodically ingest machine created/updated data for each machine ID until you detect they are stopped. You can use Get Machine via REST for this. You’ll thus need to store the machine ID for every machine that your users create.
the best way to calculate uptime for your Machines is using the Prometheus metrics. you can check fly_instance_up metric for this: Metrics on Fly.io · Fly Docs
No, there would be no need. You will already have a database record of machines that you believe are running, so every hour read the status of them, until you find they are stopped, at which point you can update your database with their new status.
For each machine you newly discover as stopped, use the API l suggested to get a lifetime in seconds.
Or listen to Lillian’s advice; after all, they work for Fly!
Sounds reasonable, thank you! Does Fly also bill the machines from this metric? I want to achieve a 1-1 relationship between seconds I bill and get billed, just for the sake of transparency.
Just to note:
The Fly Prometheus Metrics seem to have a scraping interval of 15s. How should I treat that inaccuracy within my service? Is there a way to get the uptime more accurately?
Fly.io itself has to have some kind of method to determine machine uptime, right?
Alas, fly needs to calculate usage in units of seconds to bill me the right amount for machine use at the end of the month.
Could there be a way for me to access just that “uptime number” on a specific machine? I mean with the new usage insights, the data has to be somewhere at least.
I still haven’t found a way to more correctly determine a specific machine’s total uptime.
I might go back to the “heartbeat” idea, just scaling it down to something like every 10 seconds and using the machine events like created, started, suspended, destroyed and so on that I can get from the Fly Machines API.