CockroachDB on Fly.io?

New to Fly.io tbh… and I’ve been learning/playing around with CockroachDB. I’ve been testing it on GKE for the time being, but was really curious if it would be possible to run on Fly!

I know that Fly offers persistant disk space… right? Are these disks fast enough to run CockroachDB? I know GKE has different types of disks you can choose from…

Has anyone experimented with CockroachDB on Fly yet, and if so what was your experience?

I know that you can run CockroachDB from a Docker image (with HA setups too)… they have some examples here GitHub - cockroachlabs-field/docker-examples: CockroachDB examples using Docker and Docker Compose

Does anyone know of any issues there would be of running on Fly?

Thanks in advance for any info :slight_smile:

2 Likes

It’s possible! It’s simpler to run in insecure mode since all the network transit is encrypted.

This is probably the best one to start with: docker-examples/docker-compose.yml at master · cockroachlabs-field/docker-examples · GitHub

The simplest thing here is probably to run a single app with start-single-node, and then a second app with start + join referencing the first one.

1 Like

Is there any way we could get an Official single node and HA example on Fly? I think a lot of people would be interested in deploying CockroachDB on Fly as well! :blush:

1 Like

Turns out, it’s not all that hard to get going! Here’s an example multi region cockroachdb setup on Fly. It doesn’t use haproxy or anything, your app will need to connect directly to the nodes for it to work right.

You will need today’s release of flyctl for these instructions to work.

2 Likes

Thank you so much!!! :heart_eyes:

You should not at any time run a production workload using an insecure cluster as there are many security features that do not run with insecure mode.

Take a look here for more information: Production Checklist | CockroachDB Docs

A couple of other things to note:

CockroachDB requires more than 50% of the nodes in the cluster to be alive for the cluster to operate. With the current example there is no way to ensure more than 50% of the nodes are kept alive during a deploy / upgrade.

In the example no health checks are setup to ensure that unhealthy nodes are replaced as a node can become unhealthy without the CockroachDB process exiting.

I think this was a quick first pass a how it’s possible to deploy CockroachDB on Fly.

The service is not exposed through the public internet, only through our private encrypted network.

However, the caveats in that list aren’t too good:

  • Any user, even root , can log in without providing a password.

Yikes.

Our deployment process uses a rolling strategy whenever volumes are attached. Meaning we’ll upgrade 1-by-1 each node, waiting for them to come back up as healthy.

That’s a good point, we can add a script check for cockroach’s unhealthiness. A check failure means a restart.

1 Like

It’s true! This is just an example (although PRs are acceptable).

CockroachDB is working on making --insecure less of a bazooka. mTLS is dumb and not necessary on Fly. Once they get that cleaned up we should be able to add auth and keep the example simple.

2 Likes

I would disagree that mTLS is dumb, it’s a straightforward way to verify that two nodes are indeed part of the same cluster and have the correct authority. Its especially useful when you expose the database to the Internet so that third-party services / vendors can connect with the database (CockroachDB shares the SQL and Node comms on the same port).

The referenced issue wouldn’t necessarily remove mTLS, simply make it easier to work with, more transparent and less cumbersome.

I might get some time over the next week where I can upgrade the example to a full secure cluster in a way that’s similar to our deployment on AWS Fargate.

One issue / limitation that I’ve encountered with Fargate (which as far as I know fly.io has too) is the ability to add secret files to the image. In the case of CockroachDB, we base64 the certificates and keys before storing them into the secrets, then the image entry script will decode them and store them on disk and pass those paths to the CockroachDB process.

1 Like

Ok you’re right. mTLS is generally dumb but in Cockroach land it’s useful. :slight_smile:

--sql-addr and --accept-sql-without-tls will be helpful at least. Fingers crossed the remove the mtls requirement between cockroach nodes soon and just let you do a pre shared RPC key.

You’re running into part of my malign influence on Fly. :slight_smile:

Here’s an HN thread if you’re interested in the evil ideas I’m spreading here:

https://news.ycombinator.com/item?id=25380301

(Part of what we don’t like about mTLS is that between WireGuard and eBPF, we’ve got 95% of the value of mTLS but with native sockets).

3 Likes