fly.io terraform provider

Hey folks, I’ve started writing a terraform provider for fly. I wanted to gauge community/company interest and see what features people would be looking for and what I should attack first. I’ve attached a really pared down demo of a fly_app data provider that can be used to retrieve state information about one of your apps, this is just so people can get a general idea (I have stripped a bunch of features from this demo because they don’t work perfectly yet). Next up, creating said apps programatically.

Take a look! I’d love to hear what people think!

asciicast

EDIT: here is a more complete example of the ‘fly_app’ data provider:
asciicast

5 Likes

cc: @hb9cwp and @ajbouh, I know both of you were interested in this at some point.

2 Likes

Added a fly_app resource to programatically provision apps with terraform:
asciicast

2 Likes

@DAlperin Thank you, great stuff! Notably the third example looks very promising. Looking forward to seeing more. I am happy to assist with testing whenever you are ready to get external feedback.

That would be great!

What providers would you most like to see? Currently the provider for fly apps is mostly in good shape. Volumes are next. Those two cover a lot of my requirements but would be happy to add more.

I’ll probably publish an alpha mid next week.

1 Like

Looking forward to try your Provider on Terraform Cloud (TFC)!

What providers would you most like to see?
Equivalents for fly launch/deploy as well as (a subset) for fly info and/or fly ips list would be great.

However, I am unsure if this is a reasonable request. Apparently, the objective of Fly is “CLI first”, whereas HashiCorp is “API first”.

This would allow me to setup CI/CD pipelines using repos on GitHun/GitLab with Workspaces on TFC, and also have Terraform to update DNS records on the authoritative DNS servers of my domains dependending on the output of info|ips list :slight_smile:

Eventually, have you come across any CLI or API from Fly that would enable us users to change TTL and/or delete AAAA records within the *.fly.dev. domain? Thank you.

This seems basically all doable with a few caveats

  • You can create deployments with terraform but you need to build and push your image to the fly registry first
  • I haven’t looked into the DNS record thing but I don’t recall seeing anything(?) I’ll look.

Otherwise yeah everything you described should be possible with the alpha. Ips and apps can be resources (with the apps one already being done) and fly info/fly ip list etc will be data providers that are just a loose wrapper around the API.

1 Like

This is super cool. You should check out machines (fly machine run ...). It’s in early preview, but it gives you control over individual VMs.

1 Like

I saw that. Seems super cool! I think I’ll skip it for the alpha. For the alpha I’m planning to have:

Resources

  • apps
  • volumes
  • ips
  • certificates
  • releases

Data sources

  • apps
  • volumes
  • ips
  • certificates
  • releases
  • users
  • organizations

And also maybe DNS stuff if I can figure out how to convince the fly API to do that with minimal pain, but no promises there.

@hb9cwp Do you have a matrix (preferably) or discord username if you don’t mind sharing? I’d like to maybe run some ideas for the API by you, since you have the only other real use case (I know of) besides mine so far.

1 Like

Also @kurt I’d love to chat at some point once I get this a little further along. I’d love to keep the provider in line with the fly teams vision so it seems as seamless as possible.

1 Like

You can find a summary of my current “use case” which is still an experiment and personal project to learn about Fly, Docker builds, and another experiment “NomadOS” which turns the HashiStack upside down.

Currently, I have got TFTP servers and custom iPXE builds working. After some polishing, I will re-use the same scaffolding for the other half to build NomadOS images and providing them from static HTTPS servers.

Eventually, I will try and merge them into a multi-process app to save resources on Fly initially. But maybe it is reasonable to keep them apart, as the two halves provide two different services which are decoupled from each other, and each of them has its own life cycle. Scalability is less an issue, unless NomadOS should take off one day, maybe once HashiCorp releases HCP Nomad.

Your Terraform Provider for Fly will fit nicely with my other workflows over at Terraform Cloud.
I’ll PM you.

Hey @kurt, is there anyway we could set up some time to chat next week? I am just about ready to release a v1 of the provider but I have run up against some weird opaque API behavior that is blocking me. Basically it seems like there are a bunch of race conditions in the backend that happens when you create resources quickly or in tandem (understandably the synchronous nature of the cli has masked these issues). Because I don’t have a view into what is actually happening on the backend I have run into a troubleshooting wall. I’d love to work together to try and find some solves for these issues so that we can release a v1 of the provider into the world!

Feel free to reach out here or shoot me an email at dov(at)dov(dot)dev

Thanks!

(also cc: @jsierles)

1 Like

Can you add us to a repo to check it out? That would be the easiest way to help you asynchronously. Or, feel free to drop specific questions here. The convo is likely useful for anyone working with our API.

I have to get the repo cleaned up a little but will share soon.

Currently the biggest weird problem I’m running into is as follows:

  1. Create app using createApp mutation.
  2. Immediately after fire concurrent requests to:
    a. create a volume on the app
    b. create cert on the app
    c. create ipv4 on the app
    d. create ipv6 on the app

Here is the problem. Either the ipv4 or ipv6 creation will return this graphql response:

{
  "error": {
    "errors": [
      {
        "message": "An unknown error occured.",
        "extensions": {
          "code": "SERVER_ERROR"
        }
      }
    ],
    "data": {}
  }
}

The thing is that both addresses get created, but one of them returns a response claiming that it did not, this is problematic since it confuses terraforms internal state management to no end.

It seems like it only happens when you have multiple ip creations running concurrently. Playing around with it manually in the playground and waiting to fire the IP requests one after the other once the previous one completes works. It sort of feels like something is grabbing a read lock on wherever your ip source of truth is, but thats just me in the peanut gallery wondering.

Edit:
For those curious the current workaround I am using is this:

resource "fly_ip" "exampleIp" {
  app        = "hellofromterraform"
  type       = "v4"
  depends_on = [fly_app.exampleApp]
}

resource "fly_ip" "exampleIpv6" {
  app        = "hellofromterraform"
  type       = "v6"
  #Explicitly chain
  depends_on = [fly_ip.exampleIp]
}
2 Likes

I was also wondering if the concept of processGroups is defined anywhere?

1 Like

If this is useful to anyone in the future here is the dumped graphql schema for the api: fly-schema.graphql · GitHub

You can also always get it from here: GraphQL Playground

On the right click on “Schema”

Totally missed the download button! Thats so helpful, thank you!

Update, since I’m kind of sort of using this as a dev log now I guess.

Just hacked together userspace wireguard in the provider (akin to flyctl) so I can operate on postgres databases from the provider. That stole an hour or two of time and I suspect it will cost me some more considering I haven’t tested it yet. I’m hoping to be able to open source an alpha by Thursday. Hopefully I’ll have it in the terraform plugin registry by then too.

1 Like

:wave: This is sounding pretty great.

I wound up stumbling into some of our error reporting that appeared related to your work. A lot of our tools use the API in a very specific way and you seem to have uncovered some gaps by using it differently! That’s great though, it allows us to make it better.

I’ve addressed much of what I could find of those issues. So, hopefully you run into a few less of those server errors. I’ve also added some extra logging context to some of these spots that will hopefully give us more insight to better resolve should some of these come back up.

Excited to see what you put together!

2 Likes