Again issue Deploying to MAA Region

We are again facing issues deploying to MAA region. Worst part is, if the deployment failed, it did not roll back to previous stable version, and again causing our app to stop. Its failing for last 30 mins now.

v96 is being deployed

2013f080c0a: maa pending

202v96 failed - Failed due to unhealthy allocations - not rolling back to stable job version 96 as current job has same specification

203***v96 failed - Failed due to unhealthy allocations - not rolling back to stable job version 96 as current job has same specification and deploying as v97

Our registry has been having intermittent issues related to our previous API outage: Status - API and Dashboard throw 5xx errors

That should not take your app offline, however, it should just fail the current deploy. It looks like your app has one volume and is running a single instance. You’ll need to add a second volume and fly scale count 2 to prevent downtime during deploys.

I’m getting your app running again now, give it a few minutes.

do I create 2nd volume in same region? How will the volumes data sync?

Yes, in the same region. They do not sync, though. Our volumes are meant to be run in pairs for better uptime. Sync is left to the application, however. What are you storing on the volume?

You’re likely better off using a postgres cluster for storing data. That handles all the high availability / sync / failover for you. You can run other clustered database-like software if you’d rather.

There are some user uploaded images and videos that we need to store. Should we rather use a service like S3 instead?

Can I redeploy now?

Yes you are better off using S3 because it’ll handle redundancy for you.

You should be able to deploy now. Deploys will always have downtime with a single volume, though.

Why not come up with a storage on your network. I like Fly :slight_smile: How about a FlyStore!

Anyway, if I remove volume, and then deploy single instance, will that be okay? Or do I still need to deploy in 2 regions?

Deploys will be zero downtime if there’s no volume mounted. You’re still better off running fly scale count 2 for redundancy, though.

We’d like to do storage! And people do keep asking for it, but it’ll be a while before we’re comfortable adding another service.

Again, Scale count 2 in 1 region would be better or 1 instance in 2 regions is better? I would like to be in MAA as much as possible due to latency. Our primary customers are in India.

1 Like

Oh, sorry. 2 instances in one region would be better! That’ll spread them across redundant hardware, even in the same region.

You mostly don’t need to put anything in a second region. That will help if we have a full region network outage, but those are very rare.

Can you please expand on this? The docs mention there’s some kind of “anchoring” between scale count and volumes. And I can kind of see how volumes would affect deployments, but I am not sure if I understand it well enough.

Is my understand right that: If my setup’s single instance with a single volume, then new deploys always go down until that volume could be detached from the current instance and attached to the new one? I’d expect a new volume instead to be attached to the newly deployed instance, though…

Yes that’s correct. If you have one volume, the VM using it has to stop before a new one can start. If you have two volumes, this can happen with no downtime.

You don’t need to use volumes to “anchor” things to regions like this anymore. We now support fly scale count 2 --max-per-region 1, which should let you keep two vms in two different regions full time.

1 Like