Hey everyone! I need to make an API so I started searching for providers and found Fly, which seems great for me! (up until now, I’ve only used Firebase’s services, so I’m used to never caring about scaling)
The API should just be 1 Node.js endpoint, which performs 2 network requests; The first one to googleapis’ Sheet API and the second to OpenAI completions API.
I’ve already built a part of the API with the default configuration, but I’m thinking that I should change it to have 1 VM always active, so there are no coldstarts and scale horizontally from there.
The thing is, I’m not sure how well it can scale, nor how much can it cost. Should I scale the VMs vertically? How many requests can a machine handle, based on what the API does?
I would say that since you should add some redundancy in the first instance, start with horizontal scaling. When your app is exclusively in development, you can have one machine in one region, and as you get some initial users, you could have two machines in one region, or a small number of machines in two regions, etc.
In terms of your individual spec, start with 256M (the smallest machine) and bump up from there. I run a PHP/Laravel/Apache app in this container size, and I’ve just had to reduce the number of Apache workers to stop the machine from crashing.
Your app does not sound demanding at all, though of course it depends on what req/sec throughput you expect. Both Google Sheets and OpenAI have rate-limiters, and you’ll hit into them well before you need to increase from 256M.
Yes, I generally would. Small machines are so cheap that having them always running is a good default. You certainly can create elastic systems that will wake a machine based on incoming traffic, but if you’re in the development phase at present, I’d say it’s not worth getting distracted for the purposes of saving 5 USD per month.