Fly Machines Terraform feedback

dsiddharth · August 31, 2022, 10:51am

Hello,

I’ve tried Fly Machines in the past, but didn’t want to manage VMs only by using the API directly. So I was really excited to see a Terraform provider exists, which is awesome!

My use case was to move an app I had running in Fly already to Machines. Overall, I was able to achieve the goal with a few key notes:

Passthrough handler isn’t configurable. I opened a PR here to fix this: Allow service handlers to be empty for passthrough traffic by dsiddharth · Pull Request #76 · fly-apps/terraform-provider-fly · GitHub
For the fly_machine resource, only region and image are required. However, if cputype, cpus and memorymb are also not set, it results in a failure to start the Machine without a clear error.
Random flakiness. I hit errors a few times only to discover that a re-run with nothing changed succeeded.
Tunnel crashes when too many resources are being created/modified/deleted at the same time.

Ultimately, I was able to get what I needed done, but the experience was far from smooth and there were many points where I debated giving up.

DAlperin · August 31, 2022, 5:48pm

Hi @dsiddharth I’m the main developer of the terraform provider. I really appreciate the feedback, please keep it coming! I’d like to address your notes.

I actually had run into this myself yesterday, I pushed my local fix and it should be available in 0.0.14
Can you try again and make sure you have the latest version? The fly_machine resource has pretty robust tests and I am not seeing any issue on the latest version with the configuration you describe.
This is a known issue, it comes down to the fact that the machines api is still in beta and it currently relies on another API behind the scenes that is flaky. The team is actively working on correcting this.
I’ll run this report to the right folks, although it seems like the tunnel will be optional soon.

Like I said please keep the feedback coming! I really appreciate it.

dsiddharth · September 1, 2022, 2:03pm

First off, huge thanks for building the provider! Without it, I wouldn’t have even been willing to try Machines in production. Glad to hear you found my feedback useful, so here’s some more

Awesome, 0.0.14 fixed the empty handlers issue for me!
I wanted to test a new machine with empty cpus and memorymb but with this block:

resource "fly_machine" "test" {
  app    = var.app_name
  region = "ewr"
  name   = "${var.app_name}-test-ewr"
  image  = var.image_name
}

The plan completes successfully, but the apply fails with this message:

╷
│ Error: Failed to create machine
│ 
│   with fly_machine.test,
│   on main.tf line 42, in resource "fly_machine" "test":
│   42: resource "fly_machine" "test" {
│ 
│ Create request failed: 422 Unprocessable Entity, &{ID: Name: State: Region: InstanceID: PrivateIP: Config:{Env:map[] Init:{Entrypoint:[] Cmd:[]} Image: Metadata:<nil> Restart:{Policy:} Services:[] Mounts:[] Guest:{CPUKind: Cpus:0 MemoryMb:0}} ImageRef:{Registry: Repository: Tag: Digest: Labels:{}} CreatedAt:0001-01-01 00:00:00 +0000 UTC}
╵

Any idea why request is being made with null/default values for all fields even for those that are explicitly set?

Figured this was the case since Machines overall feel very rough right now.
I was able to skip the flyctl machines api-proxy command, by updating the provider to use a custom fly_http_endpoint like so:

provider "fly" {
  fly_http_endpoint = "[_api.internal]:4280"
}

I see that flyctl can directly interact with Machines without Wireguard (i.e. `flyctl machines list). Can I also just use this API endpoint with the Terraform provider instead of internal API endpoint?

A new thing I’m noticing when running terraform apply with new resources is that some of the existing resources are being replaced, like so:

Terraform will perform the following actions:

  # fly_machine.coordinator["ewr"] must be replaced
-/+ resource "fly_machine" "coordinator" {
      ~ cpus     = 0 -> 1
      + cputype  = "shared"
      ~ env      = {} -> (known after apply)
      + id       = (known after apply)
      + image    = "registry.fly.io/hathora-games-coordinator:deployment-01GBT8TRAYG8QSPB41NZ4KQGE2"
      ~ memorymb = 0 -> 2048
      + name     = "hathora-games-coordinator-ewr" # forces replacement
      + region   = "ewr" # forces replacement
      ~ services = [
          + {
              + internal_port = 8080
              + ports         = [
                  + {
                      + handlers = [
                          + "tls",
                          + "http",
                        ]
                      + port     = 443
                    },
                  + {
                      + handlers = [
                          + "http",
                        ]
                      + port     = 80
                    },
                ]
              + protocol      = "tcp"
            },
          + {
              + internal_port = 7147
              + ports         = [
                  + {
                      + port = 7147
                    },
                ]
              + protocol      = "tcp"
            },
        ]
        # (1 unchanged attribute hidden)
    }

Seems like the two things forcing replacement at cpus (0 → 1) and memorymb (0 → 2048). This is surprising because when I created that machine, I already set the CPU to 1 and Memory to 2048. Could this be because the Machine is shutdown or unallocated because of inactivity and Terraform doesn’t understand this?

DAlperin · September 1, 2022, 11:36pm

Let me look into the last part, as for the first part can you try running this and telling me if it applies cleanly? (Replace with your own app name)

terraform {
  required_providers {
    fly = {
      source = "fly-apps/fly"
      version = "0.0.14"
    }
  }
}

provider "fly" {
  # Configuration options
}

resource "fly_machine" "test" {
  app    = "flyiac"
  region = "ewr"
  name   = "verbasic-test-ewr"
  image  = "nginx"
}

dsiddharth · September 2, 2022, 12:27am

In a new folder, that applied cleanly without any issues. It’s only been a problem for me when there’s an existing statefile with machines already running.

DAlperin · September 2, 2022, 1:55am

Hmm ok, I think this is some weirdness with how the provider interacts with the API. I’m actually just finishing up some fixes that will make it easier to debug these kinds of problems, so let me see where I get with that

dsiddharth · September 2, 2022, 11:52am

Awesome, keep me posted! Happy to switch to Discord or Slack if it’ll make it easier for us to share feedback faster.

hb9cwp · September 4, 2022, 10:11am

@DAlperin Currently, your minimal example above fails, both in EWR and in FRA. Also, if I add resource "fly_app" "flyMachineMinimal" { ... } block, which creates the App, but it can’t create the Machine:

fly_machine.test: Creating…
╷
│ Error: Failed to create machine
│
│ with fly_machine.test,
│ on flyMachineMinimal.tf line 24, in resource “fly_machine” “test”:
│ 24: resource “fly_machine” “test” {
│
│ Create request failed: 404 Not Found, &{ID: Name: State: Region: InstanceID: PrivateIP: Config:{Env:map[ ]
│ Init:{Entrypoint:[ ] Cmd:[ ]} Image: Metadata: Restart:{Policy:} Services:[ ] Mounts:[ ] Guest:{CPUKind: Cpus:0
│ MemoryMb:0}} ImageRef:{Registry: Repository: Tag: Digest: Labels:{}} CreatedAt:0001-01-01 00:00:00 +0000 UTC}
$ fly version
flyctl v0.0.387 linux/amd64 Commit: d46c14f3 BuildDate: 2022-09-01T21:55:51Z
$ terraform version
Terraform v1.2.8
on linux_amd64
+provider Terraform Registry v0.0.14

DAlperin · September 4, 2022, 5:12pm

Hmmm can you check and make sure your fly_machine block includes depends_on = [fly_app.flyMachineMinimal] since trying to create a machine based on an app that does not exist would result in that 404

hb9cwp · September 4, 2022, 5:50pm

Yes, the stanza has this dependency, and I had also deleted all Terraform files/subdir, then re-ran terraform init before terraform validate and terraform plan/apply:

resource "fly_app" "flyMachineMinimal" {
  name = "fly-machine-minimal"
  org  = "personal"     # "slug" in  $ fly list orgs
}

resource "fly_machine" "test" {
  app    = "flyMachineMinimal"
  region = "ewr"
  name   = "verbasic-test-ewr"
  image  = "nginx"
  depends_on = [fly_app.flyMachineMinimal]
}

Just retried the cycle delete - init - apply again, still with the same error like earlier today. The App gets created, but the dependent Machine throws the Error message quoted above.

hb9cwp · September 4, 2022, 6:38pm

Maybe, is it a problem that the App is pending that fly-machine fails? The error persist when I run terraform apply once the App exists already after the first run.

$ fly list apps
  NAME                | STATUS  | ORG      | DEPLOYED      
----------------------*---------*----------*---------------
  fly-machine-minimal | pending | personal |               
  OOOO-nomados2       | running | personal | 3 months ago  
  OOOO-pxe1           | running | personal | 3 months ago  
$ fly status -a fly-machine-minimal
App
  Name     = fly-machine-minimal          
  Owner    = personal                     
  Version  = 0                            
  Status   = pending                      
  Hostname = fly-machine-minimal.fly.dev  
  Platform =                              

App has not been deployed yet.
$

Three Apps should still fit within the free tier, I guess. terraform destroy removes the App and all state.

dsiddharth · September 6, 2022, 10:00pm

@DAlperin any luck with the improved debugability? We’re still hitting the issue when trying to create new Machines in an existing workspace, like so:

│ Error: Failed to create machine
│ 
│   with fly_machine.coordinator_green["mad"],
│   on main.tf line 1, in resource "fly_machine" "coordinator_green":
│    1: resource "fly_machine" "coordinator_green" {
│ 
│ Create request failed: 422 Unprocessable Entity, &{ID: Name: State: Region: InstanceID: PrivateIP: Config:{Env:map[]
│ Init:{Entrypoint:[] Cmd:[]} Image: Metadata:<nil> Restart:{Policy:} Services:[] Mounts:[] Guest:{CPUKind: Cpus:0 MemoryMb:0}}
│ ImageRef:{Registry: Repository: Tag: Digest: Labels:{}} CreatedAt:0001-01-01 00:00:00 +0000 UTC}

DAlperin · September 6, 2022, 10:44pm

Version v0.0.15 is working its way through the pipeline now. When it becomes available in a few minutes, can you upgrade to that and try running terraform apply like DEBUG=1 TF_LOG=debug terraform apply and provide the output as a paste or gist? If you aren’t comfortable sharing that publicly, please let me know, and I can provide an alternate means of communication.

DAlperin · September 6, 2022, 10:46pm

https://github.com/fly-apps/terraform-provider-fly/runs/8217576827?check_suite_focus=true

DAlperin · September 6, 2022, 10:56pm

Ok there we go: Terraform Registry

hb9cwp · September 6, 2022, 11:27pm

HTH, sending DEBUG output via PM as size here is limited to 32 kB, thank you.

DAlperin · September 6, 2022, 11:58pm

@hb9cwp Looks like you are creating an app with the name fly-machine-minimal but trying to reference the app name flyMachineMinimal when creating your machine. flyMachineMinimal which corresponds to the resource label you set in terraform (in the line resource "fly_app" "flyMachineMinimal") but does not correspond to the actual name of the app you created (in the line name = "fly-machine-minimal"). Let me know if this helps.

hb9cwp · September 7, 2022, 1:47am

@DAlperin Indeed, you are right, that fixes my self-inflicted problem, thank you very much!
So, your example works, in both EWR and FRA, if applied from a clean directory.
Here is the .tf completed with the App stanza and the Machine’s dependency on it:

terraform {
  required_providers {
    fly = {
      source = "fly-apps/fly"
      version = "0.0.15"
    }
  }
}

provider "fly" {
}

resource "fly_app" "flyMachineMinimal" {
  name = "fly-machine-minimal"
  org  = "personal"     # "slug" in  $ fly list orgs
}

resource "fly_machine" "test" {
  #app    = "flyMachineMinimal"      # <=== wrong reference!
  app    = "fly-machine-minimal"
  region = "ewr"
  name   = "verbasic-test-ewr"
  image  = "nginx"
  depends_on = [fly_app.flyMachineMinimal]
}

$ fly list apps
  NAME                | STATUS  | ORG      | DEPLOYED      
----------------------*---------*----------*---------------
  fly-machine-minimal | pending | personal |               
  OOO0-nomados2       | running | personal | 3 months ago  
  OOO0-pxe1           | running | personal | 3 months ago  

$ fly m list -a fly-machine-minimal

1 machines have been retrieved.
View them in the UI here (https://fly.io/apps/fly-machine-minimal/machines/)

fly-machine-minimal
ID              NAME                    STATE   REGION  IMAGE                   IP ADDRESS                      VOLUME  CREATED                     LAST UPDATED         
06e8262c115873  verbasic-test-ewr       stopped ewr     library/nginx:latest    fdaa:0:5f91:a7b:ab3:77e3:796f:2         2022-09-07T01:17:03Z        2022-09-07T01:17:05Z

$ fly logs -a fly-machine-minimal
Waiting for logs...
2022-09-07T01:55:56.178 runner[9080172c159287] fra [info] Reserved resources for machine '9080172c159287'
2022-09-07T01:55:56.183 runner[9080172c159287] fra [info] Pulling container image
2022-09-07T01:55:57.022 runner[9080172c159287] fra [info] Unpacking image
2022-09-07T01:55:57.242 runner[9080172c159287] fra [info] Configuring firecracker
2022-09-07T01:55:57.296 app[9080172c159287] fra [warn] Virtual machine exited abruptly
2022-09-07T01:55:58.351 runner[9080172c159287] fra [info] machine exited with exit code 0, not restarting

DAlperin · September 7, 2022, 2:22am

Glad it worked! Yeah, that particular footgun is a consequence of the terraform syntax in general. I’m thinking of trying to document a warning about that somewhere.

@dsiddharth I think your problem is different, so if you can send me the debug logs when you get a chance I can peek into that one as well!

dsiddharth · September 8, 2022, 5:44am

@DAlperin I had time to look into this further and turns out, I was trying to create new Machines with same name as an already running Machine or a recently deleted one.

In some instances, I just had to let the recently deleted one get pruned from Fly’s records before recreating. But in all cases, if I picked a new name, I was able to launch new Machines without any issue.

I think the only thing that can be improved is the error message of the underlying API. Today it gives a generic 422 Unprocessable Entity but it must know that the reason for the failure is because of name conflicts.

Thanks for the quick responses and support here!

Topic		Replies	Views
How is the community automating Fly infrastructure?	11	2293	November 7, 2023
Terraform: Apps for machines must be created via the Machines API Questions / Help	4	291	March 9, 2023
fly.io terraform provider Questions / Help	23	3117	February 9, 2023
release: unofficial terraform provider	2	662	May 24, 2022
apps v2 wishlist wishlist	3	367	April 9, 2023

Fly Machines Terraform feedback

Related topics