I deployed my app using LiteFS, and the following scenario fails the syncronization:
machine 1 machine 2 are stopped.
user requests the app.
machine 1 starts for serving.
machine 1 receives post request
machien 1 stops.
user requests the app.
machine 2 started, and promoted to primary, without replicating machine 1.
machine 2 receives post request, writing to its local instance.
You can easily replicate the case by manually starting and stopping machines.
I thought, if all machines are stopped, fly would remember the last primary instance, so that a new started instance replicates it. However, it does not happen. Actually I get the following upon fly status
PROCESS ID VERSION REGION STATE ROLE CHECKS LAST UPDATED
app 568372e0b7238e 20 arn stopped primary 2025-10-21T20:12:36Z
app 9185034db11683 20 arn stopped primary 2025-10-21T20:06:13Z
app e2863e92b77486 20 arn stopped primary 2025-10-21T18:48:53Z
my litefs.yml is identical to the doc except the line rails db:migrate.
No, this is a known limitation, . You need to keep ≥2 Machines running at all times in the primary region, otherwise you do see regressions of the type that you encountered. (Other users have bumped into this in the past.)
Do not combine LiteFS with autostop/autostart on Fly Machines. The Fly Proxy’s autoscaler can shut down or restart Machines without any awareness of LiteFS lease ownership or data freshness, which can result in a stale machine winning the lease and LiteFS discarding newer changes and LTX file data—risking rollback and data loss.
Hope this helps clear up the uncertainty a little!
Not completely to zero, I don’t think. The Fly Proxy’s autoscaling doesn’t know the dependencies between the different Machines, basically. If you wake up a replica and it can’t connect to the primary, then it will typically balk—if memory isn’t failing me. (I think that extra safeguard was introduced in response to an incident within Fly.io’s own infrastructure a couple years back.)
What you could do is create your own mini-scaler, via Fly-Replay, in a small, dedicated router app that knows to always wake up the primary (via the Machines API) before any of the others, etc. That would be a fair amount of work, though.
Aside: That second link covers a more complicated scenario than what you would really need, but it’s the best single summary of all the pieces that I know of…
So in conclusion, fly auto-scaling is officially not compatible with LiteFS, and your personal advise is enabling auto-scaling with a min of two machines running.
I don’t know the story of why the official recommendation became more conservative—possibly just lack of docs/Support bandwidth to explain the nuances in really full detail, …