RabbitMQ on Machines

Im trying to run RabbitMQ on fly. Had some issues with the regular Apps V1 architecture so figured I’d give machines a try since they are more configurable.

I landed on using the Fly terraform provider and it has made reasoning about deployment a little easier. Everything step seems to run to completion except when the machines hosting Rabbit start, they are killed immediately after with not a lot of info in the logs:

2022-12-23T19:55:36Z app[21781972a07789] lax [warn]Virtual machine exited abruptly
2022-12-23T19:55:37Z runner[21781972a07789] lax [info]machine exited with exit code 0, not restarting

Below is my terraform file for deploying rabbit:

terraform {
  required_providers {
    fly = {
      source  = "fly-apps/fly"
      version = "0.0.16"
    }
  }
}

provider "fly" {
}

variable "appName" {
  type    = string
  default = "service-bus"
}

variable "nodes" {
  type    = list(string)
  default = ["node1", "node2"]
}

locals {
  cluster_name = "rabbitmq-${var.appName}"
}

# Create app
resource "fly_app" "serviceBus" {
  name = var.appName
  org  = "some-org"
}

resource "fly_ip" "serviceBusIp" {
  app        = var.appName
  type       = "v4"
  depends_on = [fly_app.serviceBus]
}

resource "fly_ip" "serviceBusIpv6" {
  app        = var.appName
  type       = "v6"
  depends_on = [fly_app.serviceBus]
}

# Create volumes
resource "fly_volume" "serviceBusVolumes" {
  for_each   = toset(var.nodes)
  app        = var.appName
  name       = "${each.value}"
  size       = 1
  region     = "lax"
  depends_on = [fly_app.serviceBus]
}

# For each volume create a node
resource "fly_machine" "rabbitBus" {
  for_each = fly_volume.serviceBusVolumes
  app      = var.appName
  region   = "lax"
  name     = "rabbit_${each.value.id}"
  image    = "registry.fly.io/service-bus:amd" # my custom rabbit image that adds the .cookie file to the image
  env      = {
    RABBITMQ_NODENAME    = "${each.value.id}@rabbithost"
    RABBITMQ_MNESIA_DIR  = "/var/lib/rabbitmq/mnesia/data"
    RABBITMQ_CONFIG_FILE = "/etc/rabbitmq/rabbitmq.conf"
    RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS = "-kernel inetrc '/etc/rabbitmq/erl_inetrc' -proto_dist inet6_tcp"
  }
  mounts   = [{
    path   = "/var/lib/rabbitmq/mnesia/data"
    volume = each.value.id
  }]
  services = [
    {
      ports = [
        {
          port     = 15672
          handlers = ["http", "tls"]
        },
        {
          port     = 5672
          handlers = ["http"]
        }
      ],
      protocol      = "tcp"
      internal_port = 5672
    }
  ]
  cpus       = 1
  memorymb   = 512
  depends_on = [fly_volume.serviceBusVolumes]
}

Im not quite sure why its not starting, my hunch is something to do with networking preventing the nodes from starting up but if anyone has a clue I’d be grateful. I know quite a few people are trying to deploy Rabbit on fly and don’t know how to.

Sorry, I have no answer for you here, but I do wonder how you intended RABBIT_NODENAMEs to be reachable. I don’t see how “rabbithost” would work, in that a) each of your 2 nodes would have the same hostname, and b) where would that hostname get registered?

I’m stuck at the moment in this spot. When you spin up a machine, the only way to uniquely identify it is by the randomly generated ID. But I need to know what that hostname is going to be ahead of time if I’m going to configure the rabbit for clustering.

Do I need to spin them up, and then run some clustering commands once all the nodes are up to allow them to find one another?

(Also, I wonder if your mgmt port handlers might need to be “tls”, “http” - if the order matters, which it seems like it might)