That’s a fair question; I rushed that out… Messy VM exits can’t be eliminated entirely, , even though they should be way less frequent than what you’re experiencing.
I doubt that a two minute purgatory is intended in such situations. After all, Fly’s architecture cherishes multiple machines as the way to avoid downtime!
Maybe both of us have overlooked some configuration nuance, or maybe LiteFS itself needs patching—if it’s going to be fully robust against hardware failure, .
In contrast, achieving single-second handover in the (far more common) fly machine stop
scenario will likely just take a few more dashes of exec
seasoning, and so on.
(As you’ve already intuited, the signal-passing chain can be surprisingly brittle, in general. Some parts of Unix were designed primarily around convenience of implementation on 1970s hardware…)