I’m puzzled because on a freshly created app on Fly when I run the following command…
fly deploy -a my-new-app --regions ams --vm-size shared-cpu-1x --vm-memory 512
… machine and volume are created within the “fra” region (some default?). I don’t have any region set in fly.toml as I want to be explicit about where each deployment should go.
Creating volumes and machines separately and providing a --region flag works, but fly deploy always gets me a machine+volume in “fra”.
Either it’s a bug w/ the fly cli, or maybe since you don’t have a default region, it defaults to the one closest to you? You can try the fly scale count 1 --region to see if that works.
fly scale count 1 --region ams works. I just have to delete the old volume+machine in the fra region manually afterwards.
How can I set a default region (other than in fly.toml?). Agree, it smells a bit like a bug in fly deploy. I’m aiming to have very simple instructions to for self-hosting the app, so every additional step is bad. Hope that it can be done with a clean fly deploy command.
Generally I’d expect that setting flags on fly deploy explicitly will override any default values. It’s also surprising that if I have provided
[[vm]]
memory = "256mb"
in fly.toml, running fly deploy with --vm-memory 512 it will still use the 256. Explicit should always overrule implicit in my opinion. Or is there some other reasoning behind that I’m overlooking?
There may be confusion as to what this flag means. I use it periodically when I want to only deploy a change to a subset of the regions to which I have deployed an application.
Looking at the code, --regions is an alias for --only-regions.
Later in the code, this flag is used to remove machines from the list to deploy to:
I see no evidence that this flag was intended to be used to determine where the initial deployment should be done to if there are no machines.
Disclaimer: I am a Fly.io employee, but I did not write this code.
Thanks very much for the investigation. Let me describe my use case really quick to add context:
We’re deploying live-editable dynamic websites for clients. Each deployment is a single machine + volume with an sqlite.db. It’s a SvelteKit app running on the Node adapter.
The current deployment workflow is this:
Create a new app.
fly apps create sams-website
Set the env vars.
fly secrets set -a sams-website \
DB_PATH='/data/db.sqlite3' \
ORIGIN='https://sams-website.fly.dev'
We have one codebase for many clients, hence we can not put configuration like region or volume size in fly.toml. So that’s why we’d like to set the initial parameters with fly deploy explicit flags.
I have a few questions:
Would it be reasonable to consider the --regions also determining the initial deployment region? (this would our lives the easiest I think, could you check if your team could make that change?)
Is there an alternative on how to specify to which regions the initial deployment should go? Setting primary_region in fly.toml works, but since we want to deploy to different regions but with the same codebase we need an explicit way to set those (via CLI?)
Or asking more generally, what’s the officially suggested way when you can’t have once fly.toml per app (but multiple variations of the config for different deployments)? Should we consider managing one fly.toml file per client that we don’t check into version control? (bit worried about maintanance here, as the shared settings would need to be kept in sync)
For more context, this is our fly.toml file with generic settings.
swap_size_mb = 512 # Allocates 512MB of swap memory (make sure --volume-initial-size is set to 2GB or more)
[build]
[experimental]
cmd = ["/app/scripts/start-fly.sh"]
entrypoint = ["sh"]
[mounts]
source = "data"
destination = "/data"
auto_extend_size_threshold = 80
auto_extend_size_increment = "1GB"
auto_extend_size_limit = "5GB"
[http_service]
internal_port = 3000
force_https = true
# automatically stop Machines when the app is idle for several minutes to reduce costs
auto_stop_machines = false
auto_start_machines = true
min_machines_running = 0
processes = ["app"]
I can make the change, but first I want to explore your use case to see if there is a better solution. In particular, I want to make sure that the CLI we provide isn’t getting in your way.
It sounds to me like you have written code to orchestrate the machines you are deploying, written in perhaps bash or Node.js, and that code is shelling out commands. If you have written an orchestrator, you probably should consider our machines API: Machines API · Fly Docs, with that you can create apps and start machines, no fly.toml required at all. You can even build Docker images yourself and push them to the registry of your choice, and reference those images using our API.
With that as a baseline, what conveniences does our CLI provide that you would miss if you were to take that approach? If you identify something important to you that our CLI does that our Machines API doesn’t, I’ll look into making the change you requested. Otherwise you may find that the Machines API is actually more convenient for you…
Short-term (within the next 12 months) we’ll have less than 30 clients and prefer to make the initial deploy (with the 3 commands I mentioned) and update (with fly deploy -a sams-website) them manually. There are even clients with technical staff who want to do this themselves. So this is a great baseline setup for us.
Long-term: Absolutely interested to put in place some automation/orchestration scripts (maybe even building an “app that manages the apps”) so we can manage 100+ sites (e.g. rolling out an update to all apps and reusing a Docker image). Just can’t invest too much into that in the coming weeks (we’re a 2-men bootstrapped biz).
For us using only the CLI and a fly.toml (containing the generic config) is just the fastest way to achieve our goals. We’re already there (the only thing that needs a workaround atm is setting the original deployment region). I’m eager to study the more granular Machines API, and use that in the future. However short-term, being able to do the initial fly deploy and specifying the region would be wonderful.
Side question: Just to reassure I understand the intended philosophy behind your tools: The CLI’s main purpose is to get started fast (deploy the first app), for more complex workflows the API is recommended. That said, the CLI is a subset of the Machines API. Those assumptions correct?
That’s a good approximation. Internally to Fly, we have a different approximation which I will share:
There are two types of apps: framework apps (which run fly launch once, ideally use PostgreSQL, and have a pool of machines for high availability and/or or local responsiveness), and machines apps (which have a pool of machines that are started potentially as often as when a new request comes in).
Both approximations are at best, approximations. I claim that there is a third type, an Sqlite3 app with one machine per user. You can read more at Shared Nothing Architecture. My implementation is different than yours in that I use a single Fly app and dynamic request routing, which means for my use case fly clone is how I provision a new machine.
It is unlikely that I will get to your request today, but I should be able to look into this this week. It likely is a small change, but if I see any issues I’ll report back here.
Cool. If I can stress just one point from those pages it would be: take responsibility for your own backups. You have clients. Their data is on a volume. Given enough time and clients, assume that you will eventually experience a volume failure. We have snapshots, but read the blue box near the top of this page: Manage volume snapshots · Fly Docs
Realistically, it is unlikely to happen to you, but we have a lot more users and over time we have had users lose data and not notice until the snapshots have expired. It is not a pleasant experience.
I implemented an rsync strategy before we had Tigris, and I’m still using it. But these days I recommend litestream and Tigris. I encourage you to give it a try.
Thanks for the warning on volumes. I put hourly backups in place 2 weeks ago. We do full backups to Tigris and keep the latest hourly backup of the current day and keep 30 daily backups (right before midnight).
The reason I didn’t try Litestream (yet) was that we give clients access to the S3 bucket, so they own their backups too (they can access them via the Tigris dashbaord) It would be confusing to them to see all the extra files (WAL logs etc.). Since they have the app source code and the sqlite3 file they can redeploy on a new fly machine (or on any other platform even) without needing us (given that someone can follow the steps in the README).
Just one quick question: I use WAL mode in production, and in order to avoid creating a full copy of the DB on disk (few clients have 500MB sqlite files, as we store WebP images in the DB as well) I flush the WAL log and temporarily disable auto-checkpointing to safely upload the db.sqlite3 file. Is this a legit approach? (see code below)
#!/bin/sh
# We assume the current directory is the root of the project (e.g. /app on Fly.io)
LOG_FILE="/data/backup.log"
touch $LOG_FILE # make sure log file exists
echo "$(date -u):" | tee -a $LOG_FILE
# Flush the WAL log to the database file before uploading a backup
echo "Temporarily disable auto-checkpointing and flushing WAL log to /data/db.sqlite3 and... " | tee -a $LOG_FILE
sqlite3 /data/db.sqlite3 'PRAGMA wal_autocheckpoint = 2147483647;'
sqlite3 /data/db.sqlite3 'PRAGMA wal_checkpoint(FULL);'
# Upload to S3
node sqlite/backup.js 2>&1 | tee -a $LOG_FILE
echo "Re-enabling auto checkpointing..." | tee -a $LOG_FILE
sqlite3 /data/db.sqlite3 'PRAGMA wal_autocheckpoint = 1000;'