I don’t know how to use terraform so I just made the resources by hand. Maybe AWS CDK scripting is possible for this?
Works very well, latency is good if you’re in same region
Single domain pointed by A and AAAA records on Route53
Deployed on Apps v1 (nomad) that has been having issues
End State
Exactly the same, except powered by machines
Process
Automatic migration is not yet available. Even if it were, I’d only use it if it created a completely separate app and then allowed me to test / cut over on my own. I have no reason to believe it would work otherwise, though.
Allocated IPs as per above post, and executed the run command.
Generated new wireguard config and deployed it into my EC2 instance. This is because I took the opportunity to move my app to a real organization and wg is organization specific. Likely doesn’t affect most people
Grab all secrets using env on the v1 machine. Then used pbpaste | fly secrets import to import it into the new app. This ensured the secrets didn’t get written to my machine nor appear in my console.
Once app was running, I cloned the machine (follow instructions in scale docs) to get multi-instance. Plus, I set the CPU + memory to what I wanted.
Debugged using the fly-specific dev domain. This meant I could test it without touching my main app.
Leaving old app alone for a few days to make sure DNS changes propagate. Will decommission once I stop seeing web requests.
Verdict
Very easy to do, besides the bug that threw me for a few hours. I think you should ABSOLUTELY do this if you’re running on v1. Fly has publicly stated that the platform is not the future and that you should move.
Thanks for taking the time to write up your experience, this should be really helpful to us! Will try to add on here with our own once we’ve done it as well.
Just curious since you’re using an elixir app. I got a single vm in 5 regions and the startup time is about 40 seconds. I’m getting about a minute or so of downtime on a deploy. Are you having any issues with the rolling deploys?
All of my VMs are in one region (at least right now). Mainly for DB latency.
Looks like each VM takes ~9 seconds to deploy. So 5 in 40s would align.
I’m not sure why that would result in downtime though. Does it roll in each region at the same time, or one region after another? I don’t know your requirements, but I’d lean towards 2 machines per region rather than 1 (even if that meant 3 regions instead of 5).
The startup time for each elixir app is roughly 30+ seconds, so having them all deploy in 40 seconds seems to be causing issues as it leaves some gaps for the proxy to re-balance. I’ve tried having multiple machines in the same region also, but that didn’t work either.
Are you having any downtime on deploy and how long does it take for elixir to come up?
Thank you, @sb8244! Case studies like this are a great resource and are in no way diminished by the publication of a generic guide in docs! I took your suggestion (and stole @jsierles’ script) and added an easier way to move secrets over, leveraging fly secrets import.
@jpramassini Would love to read how your migration goes, if you feel inclined to share.
My understanding is the rolling deploys won’t start the next until the health check passes. As such, there is no downtime during deployment for me. That said, I don’t know how this works multi-region. I believe it should still roll 1-by-1 because Machines seems to take a “no surprises” approach.
Do you have TCP / Health checks enabled for your app? If no, I could see that causing the issue because the machine will restart within a few seconds and then Fly will think it’s ready to go. With a health check, it should wait until it’s actually ready to go.
BTW: Make sure you have your app set to rolling and not immediate deployment.
Well, I did “fly deploy” to deploy a single machine into one region and then “fly m update --memory 1024 --select” and then “fly m clone --select --region new_region” .
That is what I am trying, but I keep getting that Error no config changes found
If I do status on the machine I see this:
flyctl machine status 148ededf1e5289 -a APP_NAME
Machine ID: 148ededf1e5289
Instance ID: 01GWV0J59MZ3GJ749VV20AJJE4
State: started
VM
ID = 148ededf1e5289
Instance ID = 01GWV0J59MZ3GJ749VV20AJJE4
State = started
Image = APP_NAME:deployment-01GWV0GRXBQJNYV8V4WT7PJ5J9
Name = long-feather-2800
Private IP = fdaa:0:5b60:a7b:f0f:c32f:c013:2
Region = sin
Process Group = app
CPU Kind = shared
vCPUs = 1
Memory = 256
Created = 2023-03-31T04:59:37Z
Updated = 2023-03-31T05:01:57Z
Command =
Volume = vol_0enxv309o0xv8okp
And when I try to scale it:
flyctl machine update 148ededf1e5289 -a APP_NAME -s shared-cpu-2x
Searching for image 'registry.fly.io/APP_NAME:deployment-01GWV0GRXBQJNYV8V4WT7PJ5J9@sha256:f48d9bc6b0556eabeb95a698cc2119af50e2c7316cef873b91f89b97a6b30bfe' remotely...
image found: img_e1zd4m9dkklv02yw
Image: registry.fly.io/APP_NAME-v2:deployment-01GWV0GRXBQJNYV8V4WT7PJ5J9
Image size: 151 MB
Error no config changes found
I get the same if I try to change the memory directly.
fly m update -a APP_NAME --memory 1024 --select
? Select a machine: 148ededf1e5289 long-feather-2800 (stopped, region sin, process group 'app')
Searching for image 'registry.fly.io/APP_NAME:deployment-01GWV0GRXBQJNYV8V4WT7PJ5J9@sha256:f48d9bc6b0556eabeb95a698cc2119af50e2c7316cef873b91f89b97a6b30bfe' remotely...
image found: img_e1zd4m9dkklv02yw
Image: registry.fly.io/APP_NAME:deployment-01GWV0GRXBQJNYV8V4WT7PJ5J9
Image size: 151 MB
Error no config changes found