Hi,
I have two apps, a staging and production.
I set these up in the sea region.
I could connect to both of the instances with fly ssh console
All was well.
I went through some issues with failed deployments to the staging version. DB migrations, etc… I ended up having to fly scale count 0 app on both prod and stage, deleted db’s, and set count back to 1.
However, I found that I could no longer connect to the staging app with fly ssh.
I started poking around, running flyctl ssh console --select, and realized that my production instance was in maa.app-prod.internal (which I was still able to connect to) and my staging app had three instances: fra.app-staging.internal, sea(ipv6 addr1), sea(ipv6 addr2)
I started looking at the fly status and found:
ID REGION DESIRED STATUS
d48083bd ⇡ fra run
81bebe57 fra stop failed
47f9a8ff sin stop failed
c26faa23 sea stop failed
1324dfcc mad stop failed
31f2bcea gru stop failed
d5f65e91 cdg stop failed
277a8d41 fra stop failed
aa13122e gru stop failed
ea6e088c ams stop failed
And I see the regions are all over the place.
Now I have no backup regions set, just sea as my region pool.
So, I have a few questions:
Can someone explain what is going on here?
Why do I have multiple “instances” (not sure if that is the correct word) to connect to for my staging app? How do I delete these
Why is staging running in france? and why is production running in india?
How do I get back to being able to connect to my staging app?
Back to #1, what did I do to get in this mess, and how can I get back to seattle again, and prevent this from happening in the future.
I can’t give any specifics (someone from Fly will be able to) however the general cause is likely the recent issues with the Seattle (sea) region, not anything you’ve done:
Fly did hardware migrations there …
… and so I suspect your instances (yes, that is the correct word, or you could also use ‘vm’, short for virtual machine, which I’ve seen Fly tends to use instead, such as for billing) were migrated from there to other regions. In order to get the app running again.
If at that time your regions list did have other regions/backups, Fly would have probably migrated the vm to one of them. As generally it’s better to have an app running in a different region than not running in the same region. Else if a region is down for any reason, your app would be down too until that’s resolved. Which would likely take longer than simply moving a vm.
If you only want your app to run in sea, I believe you can “pin” it to a region by attaching a volume to it in that same region. That used to be possible anyway:
I’m not sure what happens if sea has any future issues and your regions list only includes sea though.
Interesting. Thanks for the information. Yes, looking forward to hearing from fly.io.
Until then, my takeaways:
add backup regions (local ones - so I don’t get placed in faraway lands, interesting the algorithm would choose France or India instead of one of the US locations)
This sounds like an issue where the app thinks it has no specific regions. This can happen if you have volumes created for an app, but not mounted. If you run fly volumes list it should show you.
You can also run fly regions list to see where it thinks it should be place. You don’t need backup regions, if the regions aren’t doing what you want, the backup regions won’t either.
Are you using the [processes] block in your config?
Ok so this is an edge case. When you have volumes, our system expects the volumes to control regions. But if you deploy without [mount] it will just put them any old place.
You can fix this by removing the volumes and then redeploying.
app thinks it has no specific regions. This can happen if you have volumes created for an app, but not mounted
First, I just deleted the two volumes, i will redeploy.
However, because I don’t understand your system, your statement above makes absolutely no sense to me.
If I set a region of sea, and my apps start in sea (with or without a volume) and the apps are redeployed without said volumes, why in the world would the system ignore my desire to be in sea, and choose these other locations. Again, I ask out of complete ignorance of how this stuff works underneath, knowing full well you are going to give me an interesting explanation. Thanks in advance.
When you create volumes, they wipe out the region you’ve set. Volumes are placed in specific regions so we basically say “run this where the mounted volumes exist”.
It should behave the way you expect, it just isn’t. It’s very much a bug.
Thanks for that explanation. That makes sense.
Now, however, everything is a mess on both my prod and staging apps (fortunately, this isn’t really production yet.
I have deleted the volumes, redeployed. This ended up deploying code that required new environment variables, which then failed (although it deployed to mia first, and then to dfw, even without volumes.
Interesting, i just checked fly regions again, and now my region pool is dfw and backup mia and ord. I guess this is part of the bugs again.
So I had to set those, and redeploy again. Now I cannot seem to connect to the DB. Which is REALLY strange, because during the build process, it runs a reset/seed command that apparently executes, but then when it runs the node app, it cannot access my postgres cluster.internal (which is in sea). Since before, when my app was in france/india and my postgres cluster was in sea, i was fine, so I don’t know why mia/dfw cannot access it?
I am getting to a place now where I just want to delete my apps and start fresh simply to stop messing with this, but since we are just checking your platform out, and if we are going to use this in production, I have to figure out what the heck is going on here.
I’ll plus one this as something that’s causing some trouble.
We used volumes to anchor our nodes because we had some random moves off into asia which we’d rather avoid. We don’t need the volumes and they are instead preventing effective bluegreen or canary deploys. We currently don’t seem to be able to effectively deploy without downtime (probably rolling is a poor fit for clustered Elixir, I’m not sure).
But now if we get rid of the volumes our apps can go wandering wherever
So, @kurt,
I have resolved all the issues with regions, both apps are now running in sea. And I can ssh console the prod app, as there is only one instance. However, when I go to ssh console the staging app, there are three instances presented.
fly status show one instance running
fly info shows both an ipv4 and ipv6 address
but fly ssh connect select show three instances, none of which match my
? Select instance: [Use arrows to move, type to filter]
> sea (fdaa:0:56ff:a7b:...:a941:2)
sea (fdaa:0:56ff:a7b:...:aa23:2)
sea (fdaa:0:56ff:a7b:...:1754:2)
But I can only connect to the last one.
What is happening here?
And how do I get rid of these?
Note, running the same command on my production app, gives me a single instance, with the .internal dns name. (i…e not the raw ipv6 addresses). ALthough previously, when they were all over the world, first was the .internal dns name, and the other two were ipv6’s. And that did not work for a connection, again only the last one.