Reading the Corrosion Blog Post, it seems that there is no global locking and a “bidding model”, implying that the at-most once semantics is not guaranteed. Am I right to think that running an app that requires a single writer (e.g. using Litestream) might not be safe on this platform?
It depends what you’re using as your locking model. I have a queue app that picks up jobs from a Postgres database, and it uses record locking to ensure that a single Fly machine owns a job. Thus I can scale that app as much as I like; even if several instances try to claim the same job at the same time, the app-level logic is that only one will succeed.
I’m aware that you can use an external service as a distributed lock, however the question is whether fly.io gives any guarantees that when setting scale count to 1, there will be at-most one VM running at any time. If not, then I believe this platform is not suitable for running apps relying on Litestream for backup.
I wouldn’t try to dissuade anyone from building their own, extra safety net, though, since it’s easy to imagine corner cases with failed crosstown migrations or such. A custom, in-Machine supervisor that held a lease with the built-in Consul cluster wouldn’t be overkill, if the app was very sensitive to multiple writers.
(My impression was that the newer versions of Litestream aren’t so fragile, but maybe I misread that in the v0.5.0 announcement.)
Having said that, single-Machine apps do have a lot of (other) disadvantages on the Fly.io platform…
Perhaps that’s the real source of lingering doubts?
Thanks for the detailed answer. So I think it is safe to assume that having the same machine running more than once, especially with a volume attached, would be considered a major breakage. That solves my question.