My question is: the litefs doc Running database migrations · Fly Docs says the recommended way to run db migrations is to have all instances in the primary region execute a db migration command. My current approach means that all instances, including those outside the primary region, would run db migration on startup. What issues may I run into using my currnet approach?
I am struggling to get my head around what it means for a non-primary node to run a db schema change. E.g. let’s say there are three instances, two in primary region (instances A & B), one in non-primary (instance C). I just deployed a new version of the app with a new migration file 00001.sql that adds a new db table. Would instance C actually execute this file? I’m thinking if A&B have already run the migration and propagated the schema change over, then no. Or if C actually executes the migration, what can go wrong?
I haven’t run into any issues so far. But maybe becayse my app is only deployed in the primary region. The app doesn’t need to scale like crazy, so maybe if I stick to deploying to the same primary region only, then effectively I’m following the doc recommnedation?
Also a big thank you to anyone who’s worked on litefs and litestream. I’m a big fan of the projects.
I’m no litefs expert, but in general, when you have a “leader-based” consensus system (i.e. one with a primary and followers, or replicas), it does not make sense for the followers to ever perform any local mutations. This is because the whole principle of this system is that the primary is in total control of mutations and it dictates mutations to all followers - this is the means of achieving consensus.
the litefs doc Running database migrations · Fly Docs says the recommended way to run db migrations is to have all instances in the primary region execute a db migration command
Actually what it says is more particular. Quoting:
LiteFS is a single-writer system so only the primary node can write to the database
This is distinct to “all instances in the primary region” in a significant way - only one instance is ever allowed to run migrations, and it must be the primary (leader) instance. Again, this naturally follows from the fact of LiteFS being leader-based.
Just to really drive this home: the primary region has no significance to LiteFS whatsoever (as far as I’m aware). It only cares about the primary node.
You asked what will happen if you simply run migrations on every instance. Probably you’ll see follower instances start to spit out errors. This is because in a leader-based system, followers will expect only to ever perform mutations on instruction from the leader. If they perform a mutation of their own accord (like a migration), then they will subsequently receive an instruction from the leader to perform that same mutation (because that’s how the system is designed to work), but it’ll already have been performed. Usually in RDBMS’ this causes an exception to be thrown, but it’s possible it won’t if the migration had IF NOT EXISTS-style clauses (I forget if SQLite supports those or not).
in general I get that only the primary node would make write operations. But the doc makes it sound like the all nodes in the primary region run database migrations, if you refer to auto-promotion in the doc here.
It even ends by saing “Your migrations must be idempotent as they will be run on each candidate node”.
Sqlite does support IF NOT EXISTS and I do use them in my migrations. Each migration is done in a transaction. On success a row is inserted in the table named migrations . On next start up, the same migration won’t be run again if a corresponding row can be found in the migrations table.
Ah okay, so looking at the comments here, my guess is that:
In general, you want your primary to be in a predefined region, often because you have picked a specific region based on average latency to your users.
E.g. if all your users are in North America, it would be bad if you ended up promoting a node in Singapore (random far away choice) to be leader, because now all writes have to go through Singapore, which is really far away for almost all your users.
Therefore, you configure LiteFS such that only nodes in your chosen primary region are candidates for leadership
In this context, one feasible way to perform migrations is to have every node in the primary region perform the migration and then pick one of them to be leader.
Because any one of them might be chosen, they all need to have performed the migration; if we picked a node that hadn’t migrated, then, necessarily, the migration would not be forwarded to followers.
Because one will be chosen as primary and subsequently it will forward the migration to the other primary region nodes, there’s an idempotency requirement on the migration itself. Would be curious to see if LiteFS does indeed blow up with this migration strategy if your SQL is not written with IF NOT EXISTS clauses.
So basically, the significance of “only primary region nodes migrate” seems to me to be down to a desire to ensure your primary node is in your chosen primary region. The further semantics of this migration strategy then follow from that requirement.
you make all the nodes in the primary region run the migration
all other nodes receive the database updates from the one of the nodes in the primary region (the one that becomes primary node in the end)
I think my approach so far has yielded the same result , even though I make all nodes run database migrations. Because 1) I have enable auto-promotion in primary region and 2) I have only deployed the app to two instances in a single primary region. So on deploy all nodes can and do run database migrations, if there are any.
I suspect I may get errors if I deploy the app to a secondary region. As that node will try to write the database migration changes but can’t, because like you said and to quote the doc LiteFS is a single-writer system so only the primary node can write to the database.
Yup exactly. You could give it a go with a different cluster if you’re curious, but otherwise I would suggest not trying it out on a cluster with important data