Hi @akutruff, it’s probably not what you’d like or ideally want but here’s an basic example of what you could do to scale based on CPU:
#!/bin/bash
if [ -z $FLY_API_TOKEN ]; then
echo "missing required FLY_API_TOKEN environment variable"
exit 1
fi
api_base=https://api.machines.dev
while getopts a:c: flag
do
case "${flag}" in
a) app_name=${OPTARG};;
c) cpu_threshold=${OPTARG};;
\?) # Invalid option
echo "Error: Invalid option"
echo "usage: ${0} -a app_name"
exit;;
esac
done
app_name="${APP_NAME:-$app_name}"
if [ -z $app_name ]; then
echo "missing required -a flag or APP_NAME environment variable"
exit 1
fi
cpu_threshold="${CPU_THRESHOLD:-$cpu_threshold}"
if [ -z $cpu_threshold ]; then
echo "missing required -c flag or CPU_THRESHOLD environment variable"
exit 1
fi
echo "autoscaling ${app_name} based on CPU threshold: ${cpu_threshold}"
while :
do
sleep_seconds=60
echo "getting CPU utilization"
cpu_util=$(curl -sS -H "Authorization: Bearer ${FLY_API_TOKEN}" \
https://api.fly.io/prometheus/fly/api/v1/query \
--data-urlencode "query=sum(increase(fly_instance_cpu{app=\"${app_name}\", mode!=\"idle\"}[60s]))/60 / sum(count(fly_instance_cpu{app=\"${app_name}\", mode=\"idle\"})without(cpu))" \
| jq -r '(.data.result[0].value[1] | tonumber)*100 | floor')
echo "current CPU utilization: ${cpu_util}"
if [[ "$cpu_util" -gt "$cpu_threshold" ]]; then
echo "need moar instances!"
stopped_machines=()
for machine in $(curl -sS -XGET -H "Authorization: Bearer ${FLY_API_TOKEN}" -H "Content-Type: application/json" \
"${api_base}/v1/apps/${app_name}/machines" \
| jq -jr '.[] | select( .state == "stopped" ) | .id, " "'); do
stopped_machines+=($machine)
done
size=${#stopped_machines[@]}
if [[ "$size" -eq 0 ]]; then
echo "no more machines to start! considering adding more"
else
echo "number of machines eligible to start: ${size}"
index=$(($RANDOM % $size))
random_machine=${stopped_machines[$index]}
echo "starting ${random_machine}"
curl -sS -XPOST -H "Authorization: Bearer ${FLY_API_TOKEN}" -H "Content-Type: application/json" \
"${api_base}/v1/apps/${app_name}/machines/${random_machine}/start"
echo "machine successfully started, waiting 5 minutes before checking CPU utilization again"
sleep_seconds=300
fi
fi
sleep $sleep_seconds
done
With the above, it does a prometheus query for CPU utilization for the app and will start any stopped machines up until there are no more machines left to start. You could use this in conjunction with our auto_stop proxy feature so your app doesn’t have to worry about when to scale down to zero.
You can deploy the above into a separate App that’s always running and because we give customers direct access to their metrics, it can be tailored to suite your needs. If your app is able to quantify the requests enough to expose as its own custom metrics, our system can scrape those and they could then be used to make your autoscaling decisions.
Hope this helps!