I see the ams machines are created but then quickly destroyed:
The health check passes and the page loads (GET) but if I do a post request nothing happens and I see this in the logs:
2023-11-15T18:24:59.520 app[1234] iad [info] Replaying: {
2023-11-15T18:24:59.520 app[1234] iad [info] pathname: '/login',
2023-11-15T18:24:59.520 app[1234] iad [info] method: 'POST',
2023-11-15T18:24:59.520 app[1234] iad [info] PRIMARY_REGION: 'ams',
2023-11-15T18:24:59.520 app[1234] iad [info] FLY_REGION: 'iad'
2023-11-15T18:24:59.520 app[1234] iad [info] }
2023-11-15T18:26:24.656 proxy[1234] fra [error] could not find a good candidate within 90 attempts at load balancing. last error: no known healthy instances found for route tcp/443. (hint: is your app shut down? is there an ongoing deployment with a volume or are you using the 'immediate' strategy? have your app's instances all reached their hard limit?)
If I check fly scale show I see that it’s set in iad:
NAME COUNT KIND CPUS MEMORY REGIONS
app 1 shared 2 512 MB iad
But I can’t do fly scale count 0 --region ams it’s for staging and I don’t need two machines for it.
Yeah one in iad and sometimes it changes, but it’s never ams Which is what I’d prefer. I don’t have [http_servers] section in my fly.toml.
Hmm, I just noticed that the error message says PRIMARY_REGION: 'ams', I thought I removed that from fly.toml, but maybe it was caches in the builder machine Docker. Does that somehow tell fly that this should be a multi-region system? I mean, I hope it will be in the future but this staging server should be just one machine.
Here’s the status:
Machines
PROCESS ID VERSION REGION STATE ROLE CHECKS LAST UPDATED
app 1857709a497738 15 iad started 2 total, 2 passing 2023-11-15T19:20:39Z
I think what’s happening here is a bit confusing as fly deploy does a few things. First it builds the docker image and push it to our registry, then it runs release commands on a new machine (if any) and finally it rolls out the image to any existing machines.
Right now, it looks like the only existing machine is in iad so we’ll want to add one in ams and then remove the one in iad.
Can you try scaling the app up in ams and down in iad, like this?
Oh yeah you can scale down too, didn’t think that. I did the scaling and I got a machine in ams, but FLY_REGION is still set to iad and the POST request fails.
I cleared the Docker cache on the builder machine and deployed again, and made double-sure I had actually set FLY_REGION = 'ams' in the [env] section of my fly.toml, and that I don’t set PRIMARY_REGION anywhere. But something is always forcing FLY_REGION to iad, and PRIMARY_REGION to ams, as I can see from the logs:
2023-11-16T09:33:46.802 app[1234] iad [info] Replaying: {
2023-11-16T09:33:46.802 app[1234] iad [info] pathname: '/login',
2023-11-16T09:33:46.802 app[1234] iad [info] method: 'POST',
2023-11-16T09:33:46.802 app[1234] iad [info] PRIMARY_REGION: 'ams',
2023-11-16T09:33:46.802 app[1234] iad [info] FLY_REGION: 'iad'
2023-11-16T09:33:46.802 app[1234] iad [info] }
This is a Remix app loosely based on the blue-stack, and you can see how the replays are handleed here in the Prisma db file and on the server.ts file. Is this setup messing with me? If I don’t set FLY_REGION and PRIMARY_REGION in my env then fly guesses it? But I did have them set before… let me try to set them again and redeploy and see what it says.
Alright, I think I figured it out. I have to: fly secrets set FLY_REGION=ams PRIMARY_REGION=ams.
I hadn’t, because zod is validating my env schema, but it didn’t throw error on FLY_REGION or PRIMARY_REGION because if I don’t set those, then fly apparently guesses them. But I had set FLY_REGION in my toml and and had set fly deploy --region ams but seems like those don’t count.
@matthewlehner followup question: what’s the correct way to set a multi-region system? I get that PRIMARY_REGION should be set to something close to me (or my users), but will fly handle the value of FLY_REGION, or do I have to set that to the region of the other machine?
Thanks for that additional context! This helps a lot. It looks like you’re using replay headers to forward requests to the region where your app and database is running. Sweet!
I did a little bit of testing on my side of things and found that what’s happening is not obvious!
I hope I can provide a bit more clarity around what’s going on with those env vars. We should be setting both of these for you, automatically.
PRIMARY_REGION
The first thing I found is that using fly deploy with the --region flag doesn’t work in an obvious way. It essentially overrides the primary_region option you can set in fly.toml. You can set it like this:
app = "psl"
primary_region = "ams"
Once you’ve done this you should be able to stop using the --region flag in fly deploy.
I had a quick look at your code and it looks like you’ve got everything in place.
The code in your server.ts looks like it should replay the right kinds of requests, and your db setup is checking if it’s a readonly replica or the leader.
That said, in the logs that you posted with the replay request, it does look like the machine running the request is in iad – the logs have the region code as the third item after the timestamp and the app[machine_id]. It looks like you’ve solved this already though.