I’ve set min_machines_running = 1
in my .toml file to ensure there’s always one inctance of my app running, but it doesn’t seem to be working.
I’m trying to run a Directus instance but the startup times are too slow to have it dynamically scale down to 0. It runs fine for about 6 minutes then it tries to downscale, even though I’ve set min_machines_running to 1. I’ve taken inspiration from this example repo: https://github.com/freekrai/directus-fly/blob/main/fly.toml.
My .toml file looks like this:
# fly.toml app configuration file generated for app-name-goes-here on 2023-06-18T07:20:56+01:00
#
# See https://fly.io/docs/reference/configuration/ for information about how to use this file.
#
app = "app-name-goes-here"
kill_signal = "SIGINT"
kill_timeout = 15
primary_region = "lhr"
[env]
DB_CLIENT="sqlite3"
DB_FILENAME="/data/database/data.db"
STORAGE_LOCATIONS="local"
STORAGE_LOCAL_DRIVER="local"
STORAGE_LOCAL_ROOT="/data/uploads"
PUBLIC_URL="https://url-goes-here.fly.dev"
PORT=8080
[experimental]
allowed_public_ports = []
auto_rollback = true
cmd = "start.sh"
entrypoint = "sh"
[build]
dockerfile = ".\\Dockerfile"
[mounts]
source="directus_data"
destination="/data"
[[services]]
internal_port = 8080
processes = ["app"]
auto_stop_machines = true
auto_start_machines = true
protocol = "tcp"
min_machines_running = 1
script_checks = []
[services.concurrency]
hard_limit = 25
soft_limit = 20
type = "connections"
[[services.http_checks]]
grace_period = "30s"
interval = "15s"
method = "get"
path = "/server/health"
protocol = "http"
timeout = 2000
tls_skip_verify = false
[services.http_checks.headers]
[[services.ports]]
force_https = true
handlers = ["http"]
port = 80
[[services.ports]]
handlers = ["tls", "http"]
port = 443
[[services.tcp_checks]]
grace_period = "30s"
interval = "15s"
restart_limit = 0
timeout = "2s"
The Dockerfile and start.sh file are very minimal:
FROM directus/directus:10.3
USER node
WORKDIR /directus
COPY . .
CMD ["bash", "start.sh"]
# This file is how Fly starts the server (configured in fly.toml). Before starting
# the server though, we need to run any migrations that haven't yet been
# run, which is why this file exists in the first place.
# Learn more: https://community.fly.io/t/sqlite-not-getting-setup-properly/4386
#!/bin/sh
set -ex
mkdir -p /data/database
mkdir -p /data/uploads
chmod -Rf 777 /data/database
chmod -Rf 777 /data/uploads
npx directus bootstrap
npx directus start
These are the logs when the machine starts to downscale:
2023-06-19T06:03:29.054 app[3d8d9349b77438] lhr [info] [06:03:29] GET /server/health 200 8ms
2023-06-19T06:03:44.120 app[3d8d9349b77438] lhr [info] [06:03:44] GET /server/health 200 8ms
2023-06-19T06:03:53.515 proxy [3d8d9349b77438] lhr [info] Downscaling app app-name-goes-here in region lhr. Automatically stopping machine 3d8d9349b77438. 2 instances are running, 0 are at soft limit, we only need 1 running
2023-06-19T06:03:53.521 app[3d8d9349b77438] lhr [info] INFO Sending signal SIGINT to main child process w/ PID 521
2023-06-19T06:03:58.677 app[3d8d9349b77438] lhr [info] INFO Sending signal SIGTERM to main child process w/ PID 521
2023-06-19T06:03:58.973 app[3d8d9349b77438] lhr [info] INFO Main child exited with signal (with signal 'SIGTERM', core dumped? false)
2023-06-19T06:03:58.974 app[3d8d9349b77438] lhr [info] INFO Starting clean up.
2023-06-19T06:03:58.974 app[3d8d9349b77438] lhr [info] INFO Umounting /dev/vdb from /data
2023-06-19T06:03:58.975 app[3d8d9349b77438] lhr [info] ERROR error umounting /data: EBUSY: Device or resource busy, retrying in a bit
2023-06-19T06:03:59.185 app[3d8d9349b77438] lhr [info] [06:03:59] GET /server/health 200 8ms
2023-06-19T06:03:59.728 app[3d8d9349b77438] lhr [info] ERROR error umounting /data: EBUSY: Device or resource busy, retrying in a bit
2023-06-19T06:04:00.480 app[3d8d9349b77438] lhr [info] ERROR error umounting /data: EBUSY: Device or resource busy, retrying in a bit
2023-06-19T06:04:01.232 app[3d8d9349b77438] lhr [info] ERROR error umounting /data: EBUSY: Device or resource busy, retrying in a bit
2023-06-19T06:04:01.987 app[3d8d9349b77438] lhr [info] WARN hallpass exited, pid: 522, status: signal: 15 (SIGTERM)
2023-06-19T06:04:02.000 app[3d8d9349b77438] lhr [info] 2023/06/19 06:04:01 listening on [fdaa:2:5f23:a7b:13e:8e80:bb03:2]:22 (DNS: [fdaa::3]:53)
2023-06-19T06:04:02.985 app[3d8d9349b77438] lhr [info] [ 487.722219] reboot: Restarting system
2023-06-19T06:04:15.341 health[3d8d9349b77438] lhr [error] Health check on port 8080 has failed. Your app is not responding properly. Services exposed on ports [80, 443] will have intermittent failures until the health check passes.
2023-06-19T06:04:15.341 health[3d8d9349b77438] lhr [error] Health check on port 8080 has failed. Your app is not responding properly. Services exposed on ports [80, 443] will have intermittent failures until the health check passes.
Have I missed something in the configuration to keep one instance always alive? I’m not sure why it says there are 2 instances running when I’ve set the scaling count to 1.