VM shutdowns and another one is never recreated

The VM in our primary region will just randomly shutdown and no longer be available, which will cause our whole app to not work since it is the only VM that can write to our db with the multi-region postgres setup. Is there a way to get better logs or insight into why this is happening?

I think better region support around scaling would be super helpful. We currently can’t use autoscaling, because autoscaling doesn’t guarantee region placement (ie. after some deploys autoscale wouldn’t create a vm in our primary region), which makes an unreliable solution with the multi-region postgres setup. Ideally I’d be great to set region specific scale counts or a minimum-count-per-region or have a way to guarantee region placement with autoscale, which I believe is a known limitation of the current autoscale solution.

This is an app with scale >1 and VMs in multiple regions, right? And the primary region VM just vanishes?

Is it possible that VM is crashing repeatedly? If it is, fly status -a <app-name> --all will show vms in a failed state. If you _do _ have failed vms, you can run fly vm status <id> to see if it failed because of a non-zero exit code or health checks. And fly logs -i <id> to see the last log lines we got for the VM.

If you want to use autoscaling with multiple regions and a postgres, you can do it with two apps. One writable that only runs in one region, and the other read only that autoscales. Most people don’t want to do this, but it is possible!

2 Likes

Great thank you @kurt!