Region changes erratically, can't set permanent region

My fly.toml:

[env]
  PORT = "8080"
  METRICS_PORT = "8081"
  FLY_REGION = 'ams'

Then I tried to set the --region when I deploy:

run: flyctl deploy --region ams --remote-only --build-arg COMMIT_SHA=${{ github.sha }} --app ${{ steps.app_name.outputs.value }}

I see the ams machines are created but then quickly destroyed:

Screenshot 2023-11-15 at 19.08.12

The health check passes and the page loads (GET) but if I do a post request nothing happens and I see this in the logs:

2023-11-15T18:24:59.520 app[1234] iad [info] Replaying: {
2023-11-15T18:24:59.520 app[1234] iad [info] pathname: '/login',
2023-11-15T18:24:59.520 app[1234] iad [info] method: 'POST',
2023-11-15T18:24:59.520 app[1234] iad [info] PRIMARY_REGION: 'ams',
2023-11-15T18:24:59.520 app[1234] iad [info] FLY_REGION: 'iad'
2023-11-15T18:24:59.520 app[1234] iad [info] }

2023-11-15T18:26:24.656 proxy[1234] fra [error] could not find a good candidate within 90 attempts at load balancing. last error: no known healthy instances found for route tcp/443. (hint: is your app shut down? is there an ongoing deployment with a volume or are you using the 'immediate' strategy? have your app's instances all reached their hard limit?)

If I check fly scale show I see that it’s set in iad:

NAME    COUNT   KIND    CPUS    MEMORY  REGIONS
app     1       shared  2       512 MB  iad

But I can’t do fly scale count 0 --region ams it’s for staging and I don’t need two machines for it.

Hi there, it looks to me like you’ve only got one VM running in IAD at the moment.

Could you show the output of fly status? This will show a bit more detail and confirm whether or not this is true.

You may be running into some autoscaling behaviour from the min_machines_running in the [http_services] section of your fly.toml.

Yeah one in iad and sometimes it changes, but it’s never ams :smiley: Which is what I’d prefer. I don’t have [http_servers] section in my fly.toml.

Hmm, I just noticed that the error message says PRIMARY_REGION: 'ams', I thought I removed that from fly.toml, but maybe it was caches in the builder machine Docker. Does that somehow tell fly that this should be a multi-region system? I mean, I hope it will be in the future but this staging server should be just one machine.

Here’s the status:

Machines
PROCESS ID              VERSION REGION  STATE   ROLE    CHECKS                  LAST UPDATED
app     1857709a497738  15      iad     started         2 total, 2 passing      2023-11-15T19:20:39Z

Hmm… very strange!

I think what’s happening here is a bit confusing as fly deploy does a few things. First it builds the docker image and push it to our registry, then it runs release commands on a new machine (if any) and finally it rolls out the image to any existing machines.

Right now, it looks like the only existing machine is in iad so we’ll want to add one in ams and then remove the one in iad.

Can you try scaling the app up in ams and down in iad, like this?

fly scale count 1 --region ams
fly scale count 0 --region iad

I’m hoping that will work! If it does, all your future deploys should be in ams!

Oh yeah you can scale down too, didn’t think that. I did the scaling and I got a machine in ams, but FLY_REGION is still set to iad and the POST request fails.

I cleared the Docker cache on the builder machine and deployed again, and made double-sure I had actually set FLY_REGION = 'ams' in the [env] section of my fly.toml, and that I don’t set PRIMARY_REGION anywhere. But something is always forcing FLY_REGION to iad, and PRIMARY_REGION to ams, as I can see from the logs:

2023-11-16T09:33:46.802 app[1234] iad [info] Replaying: {
2023-11-16T09:33:46.802 app[1234] iad [info] pathname: '/login',
2023-11-16T09:33:46.802 app[1234] iad [info] method: 'POST',
2023-11-16T09:33:46.802 app[1234] iad [info] PRIMARY_REGION: 'ams',
2023-11-16T09:33:46.802 app[1234] iad [info] FLY_REGION: 'iad'
2023-11-16T09:33:46.802 app[1234] iad [info] }

Are these cached somewhere deep in the system?

Here’s my fly.toml:

app = "psl"
kill_signal = "SIGINT"
kill_timeout = 5
processes = []

[env]
  PORT = "8080"
  METRICS_PORT = "8081"
  FLY_REGION = 'ams'

[metrics]
  port = 8081
  path = "/metrics"

[deploy]
  release_command = "bash ./scripts/migrate.sh"

[[services]]
  internal_port = 8080
  processes = [ "app" ]
  protocol = "tcp"
  script_checks = [ ]

    [services.concurrency]
      hard_limit = 25
      soft_limit = 20
      type = "connections"

    [[services.ports]]
      handlers = [ "http" ]
      port = 80
      force_https = true

    [[services.ports]]
      handlers = [ "tls", "http" ]
      port = 443

    [[services.tcp_checks]]
      grace_period = "1s"
      interval = "15s"
      restart_limit = 0
      timeout = "2s"

    [[services.http_checks]]
      interval = "10s"
      grace_period = "5s"
      method = "get"
      path = "/healthcheck"
      protocol = "http"
      timeout = "2s"
      tls_skip_verify = false
      [services.http_checks.headers]
1 Like

This is a Remix app loosely based on the blue-stack, and you can see how the replays are handleed here in the Prisma db file and on the server.ts file. Is this setup messing with me? If I don’t set FLY_REGION and PRIMARY_REGION in my env then fly guesses it? But I did have them set before… let me try to set them again and redeploy and see what it says.

Alright, I think I figured it out. I have to: fly secrets set FLY_REGION=ams PRIMARY_REGION=ams.

I hadn’t, because zod is validating my env schema, but it didn’t throw error on FLY_REGION or PRIMARY_REGION because if I don’t set those, then fly apparently guesses them. But I had set FLY_REGION in my toml and and had set fly deploy --region ams but seems like those don’t count.

@matthewlehner followup question: what’s the correct way to set a multi-region system? I get that PRIMARY_REGION should be set to something close to me (or my users), but will fly handle the value of FLY_REGION, or do I have to set that to the region of the other machine?

Thanks for that additional context! This helps a lot. It looks like you’re using replay headers to forward requests to the region where your app and database is running. Sweet!

I did a little bit of testing on my side of things and found that what’s happening is not obvious!
I hope I can provide a bit more clarity around what’s going on with those env vars. We should be setting both of these for you, automatically.

PRIMARY_REGION

The first thing I found is that using fly deploy with the --region flag doesn’t work in an obvious way. It essentially overrides the primary_region option you can set in fly.toml. You can set it like this:

app = "psl"
primary_region = "ams"

Once you’ve done this you should be able to stop using the --region flag in fly deploy.

Here are the relevant docs if you’re interested: Fly Launch configuration (fly.toml) · Fly Docs

FLY_REGION

This will automatically be set to the three letter code of the region the machine is running in. Here are the docs: The Fly Runtime Environment · Fly Docs

For a multi-region setup

I had a quick look at your code and it looks like you’ve got everything in place.

The code in your server.ts looks like it should replay the right kinds of requests, and your db setup is checking if it’s a readonly replica or the leader.

That said, in the logs that you posted with the replay request, it does look like the machine running the request is in iad – the logs have the region code as the third item after the timestamp and the app[machine_id]. It looks like you’ve solved this already though.

I hope this helps!

2 Likes

Left this hanging a bit. Thanks for the help, works as expected now! I overcomplicated the whole thing lol.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.