How to run a Typesense HA Cluster on Fly?

Hi all - I am currently running a single node Typesense machine on Fly, but would like to update this to a cluster. My current workaround for a single node (since Typsense only supports IPv4) is from this Fly.io thread using socat and supervisor to route IPv6 from Fly to Typsense IPv4.

However with multiple nodes, I feel like this gets a bit complicated. In the Typesense GitHub someone posted their ability to get a cluster working using 6Tunnel and “tricking” each node to communicate locally(?) and then use 6Tunnel to route to other nodes. You can find the post here

But in summary I’d need to run:

# Listen on IPv6 ports 8108 and make available via the defined api port for the node. For External Traffic
6tunnel -6 -l :: 8108 localhost $api_port
# Listen on IPv6 ports $api_port and make available via the defined api port for the node. For Cross-Node Traffic
6tunnel -6 -l <private v6 ip> $api_port localhost $api_port
6tunnel -6 -l <private v6 ip> $peering_port localhost $peering_port

# API and Peering tunnels for other nodes
6tunnel -4 -l localhost <node 2 api port> <node 2 private v6 ip>
6tunnel -4 -l localhost <node 2 peering port> <node 2 private v6 ip>
6tunnel -4 -l localhost <node 3 api port> <node 3 private v6 ip>
6tunnel -4 -l localhost <node 3 peering port> <node 3 private v6 ip>

I honestly don’t know which IPs from Fly I should be adding to this section and I’m hoping just referencing the internal url that Fly provides here accomplishes the same thing, here’s what I currently have but doesn’t seem to communicate correctly (currently hardcoded in Go, where I ripped most of it from the nats-cluster example you guys created). This would be an example of an LAX node that would like to communicate with another node in SJC:

exec.Command("6tunnel", "-6", "-l", "::", "8108", "localhost", "8062").Run()

exec.Command("6tunnel", "-6", "-l", "lax.foundry-typesense-cluster.internal", "8062", "localhost", "8062").Run()
exec.Command("6tunnel", "-6", "-l", "lax.foundry-typesense-cluster.internal", "8107", "localhost", "8107").Run()

exec.Command("6tunnel", "-4", "-l", "localhost", "8063", "sjc.foundry-typesense-cluster.internal").Run()
exec.Command("6tunnel", "-4", "-l", "localhost", "8107", "sjc.foundry-typesense-cluster.internal").Run()

svisor.AddProcess(
	"typesense-server",
	"typesense-server --data-dir=/data --api-key=xxx --api-port=8062 --peering-port 8107 --nodes=/etc/typesense-nodes --reset-peers-on-error",
	supervisor.WithRestart(0, 1*time.Second),
)

There’s also an additional peering-address flag you can add which needs a private IPv4 address of the specific node and I have no clue how to get something like that considering the Fly internally uses IPv6? When I don’t add the flag it binds to a 172 address.

Each machine needs a typesense-nodes file so that Raft knows all ips/ports of the cluster too, which I’ve set as (first would be for LAX, the other would be a SJC node):

localhost:8107:8062,localhost:8107:8063

I thought I should be using localhost since the 6tunnel would be routing communication of the unique ports (8062 and 8063) to the respective node.

The complete HA documentation can be found here

This is sort of a hybrid Fly.io / Typesense question, so I hope I provided enough background and would really appreciate any insight or suggestions. Thank you!!

I don’t have this setup but seems like the peering port should probably be unique per server right? Maybe try 8107 for lax and 8109 for sjc?

@ianjosephwilson That’s a good question - I am under the impression that the peering port can be the same given that the example nodes they provide all take the same port number since they’re on different hosts too. Example below:

I’m not sure if I fully understand how 6tunnel/networking/typesense peer works but I think because you are overloading the same port on ipv4 on localhost. Having the same port on the hosts would be fine if you were connecting to them over a regular ipv4 network like in that example.

Ie. Your lax typesense server is running their peer port on localhost:6107 AND you are also trying to tunnel localhost:6107 to the sjc typesense host at the same time.

Maybe just try it and see if it makes any difference. You could try to use a diff local and remote port but I think that will make the configuration on each server more confusing and you can share less.

localhost:8107:8062,localhost:8109:8063
exec.Command("6tunnel", "-6", "-l", "::", "8108", "localhost", "8062").Run()

exec.Command("6tunnel", "-6", "-l", "lax.foundry-typesense-cluster.internal", "8062", "localhost", "8062").Run()
exec.Command("6tunnel", "-6", "-l", "lax.foundry-typesense-cluster.internal", "8107", "localhost", "8107").Run()

exec.Command("6tunnel", "-4", "-l", "localhost", "8063", "sjc.foundry-typesense-cluster.internal").Run()
exec.Command("6tunnel", "-4", "-l", "localhost", "8109", "sjc.foundry-typesense-cluster.internal").Run()

svisor.AddProcess(
	"typesense-server",
	"typesense-server --data-dir=/data --api-key=xxx --api-port=8062 --peering-port 8107 --nodes=/etc/typesense-nodes --reset-peers-on-error",
	supervisor.WithRestart(0, 1*time.Second),
)

Also are you receiving an error or warning from any of the commands?

This concept seems to work locally with docker+docker-compose but I had to set the peering address to --peering-address=127.0.0.1. I might be able to try it on fly tomorrow.

Thanks for checking @ianjosephwilson if you do have a minute to try on Fly, please let me know. Happy to debug this with you tomorrow! Really just want to help get some documentation out there how to set up HA for Fly Typesense too since I imagine others would find it quite useful too :smiley:

Edit: I didn’t happen to see the example you had too that worked locally. I will give this a shot

@ianjosephwilson so I receive “connection refused” errors trying to communicate with the other servers on the peering port. I’m wondering if we also need to add a :: listener for the peering ports for each machine too?

My other hunch is that I’m either (or both):

  • Not “attaching” a listener correctly here. As in lax.foundry-typesense-cluster.internal isn’t actually listening on ::

exec.Command("6tunnel", "-6", "-l", "lax.foundry-typesense-cluster.internal", "8107", "localhost", "8107").Run()

  • Or this line is not sending the request via IPv6 to SJC.

exec.Command("6tunnel", "-4", "-l", "localhost", "8109", "sjc.foundry-typesense-cluster.internal").Run()

Here is how I set up:

localhost:8107:8062,localhost:8109:8063
if typesenseVars.Region == "lax" {

	exec.Command("6tunnel", "-6", "-l", "::", "8108", "localhost", "8062").Run()

	exec.Command("6tunnel", "-6", "-l", "lax.foundry-typesense-cluster.internal", "8062", "localhost", "8062").Run()
	exec.Command("6tunnel", "-6", "-l", "lax.foundry-typesense-cluster.internal", "8107", "localhost", "8107").Run()

	exec.Command("6tunnel", "-4", "-l", "localhost", "8063", "sjc.foundry-typesense-cluster.internal").Run()
	exec.Command("6tunnel", "-4", "-l", "localhost", "8109", "sjc.foundry-typesense-cluster.internal").Run()

	svisor.AddProcess(
		"typesense-server",
		"typesense-server --data-dir=/data --api-key=xxx --api-port=8062 --peering-address=127.0.0.1 --peering-port 8107 --nodes=/etc/typesense-nodes --reset-peers-on-error",
		supervisor.WithRestart(0, 1*time.Second),
	)

}

if typesenseVars.Region == "sjc" {

	exec.Command("6tunnel", "-6", "-l", "::", "8108", "localhost", "8063").Run()

	exec.Command("6tunnel", "-6", "-l", "sjc.foundry-typesense-cluster.internal", "8063", "localhost", "8063").Run()
	exec.Command("6tunnel", "-6", "-l", "sjc.foundry-typesense-cluster.internal", "8109", "localhost", "8109").Run()

	exec.Command("6tunnel", "-4", "-l", "localhost", "8062", "sjc.foundry-typesense-cluster.internal").Run()
	exec.Command("6tunnel", "-4", "-l", "localhost", "8107", "sjc.foundry-typesense-cluster.internal").Run()

	svisor.AddProcess(
		"typesense-server",
		"typesense-server --data-dir=/data --api-key=xxx --api-port=8063 --peering-address=127.0.0.1 --peering-port 8109 --nodes=/etc/typesense-nodes --reset-peers-on-error",
		supervisor.WithRestart(0, 1*time.Second),
	)

}

Here are my logs (I started using Axiom but havent figured how to export so here’s a screenshot)

I didn’t post the docker setup because getting ipv6 docker is a weekend task in itself. I think I almost have the fly setup working but I am having a hard time controlling deploying to a specific region. Are you easily able to deploy 1 instance in 1 region deterministically from the command line? flyctl seems to ignore --region.

I had to use 127.0.0.1 in place of localhost. For some reason 6tunnel was resolving localhost to an ipv6 address. Also I used fly-local-6pn instead of trying to find a working hostname on the internal ipv6 network when using 6tunnel.

Yea I think so. I’m using machines instead of nomad if that helps, but first thing I have is a primary_region = "lax" in my fly.toml. Which means that if there are no machines deployed and I run a fly deploy, it will deploy to LAX first and then running fly machine clone --region sjc [ID OF LAX MACHINE] -c fly.toml will force the next machine to be in SJC.

EDIT: i should note, that the docs state that the only way to horizontally scale machines with volumes is to use that clone command at the moment

Does that work for you?

start.py

from subprocess import check_call
from os import execvp, environ


def call_exec(args):
    execvp(args[0], args)

class Config:
    def __init__(self, host, peer_port, api_port):
        self.host = host
        self.peer_port = peer_port
        self.api_port = api_port

if environ['FLY_REGION'] == 'lax':
    API_PORT = "8062"
elif environ['FLY_REGION'] == 'sjc':
    API_PORT = "8063"
else:
    raise AssertionError('Unsupported region {0}'.format(environ['FLY_REGION']))

configs = [Config(*h.split(':')) for h in environ['REAL_TYPESENSE_NODES'].split(',')]
config_lookup = dict([(config.api_port, config) for config in configs])

our_config = config_lookup.get(API_PORT)
other_configs = [config for config in config_lookup.values() if config != our_config]

check_call(["6tunnel", "-6", "-l", "::", "8108", "127.0.0.1", our_config.api_port])
# Force tunnel here as well.
#our_config.host
check_call(["6tunnel", "-f", "-6", "-l", "fly-local-6pn", our_config.api_port, "127.0.0.1", our_config.api_port])
#our_config.host
check_call(["6tunnel", "-f", "-6", "-l", "fly-local-6pn", our_config.peer_port, "127.0.0.1", our_config.peer_port])

# Force tunnel, -f, because other servers might not be up yet.
for config in other_configs:
    check_call(["6tunnel", "-f", "-4", "-l", "127.0.0.1", config.api_port, config.host])
    check_call(["6tunnel", "-f", "-4", "-l", "127.0.0.1", config.peer_port, config.host])


call_exec(["/opt/typesense-server", "--peering-address=127.0.0.1", "--api-address=127.0.0.1", "--data-dir=/data", "--api-key=xxx", "--api-port="+our_config.api_port, "--peering-port="+our_config.peer_port, "--nodes=/etc/typesense-nodes", "--reset-peers-on-error"])

fly.toml

# fly.toml app configuration file generated for laspilitas-ts on 2023-05-20T09:06:34-07:00
#
# See https://fly.io/docs/reference/configuration/ for information about how to use this file.
#

app = "laspilitas-ts"
kill_signal = "SIGINT"
kill_timeout = "5s"

[build]
  dockerfile = "Dockerfile.remote"

[env]
  REAL_TYPESENSE_NODES = "lax.laspilitas-ts.internal:8107:8062,sjc.laspilitas-ts.internal:8109:8063"

Dockerfile.remote

FROM typesense/typesense:0.24.0.rcn21
COPY ./typesense-nodes /etc/typesense-nodes
COPY ./start.py /start.py
RUN apt update
RUN mkdir /data
RUN apt upgrade -y
RUN apt install -y python3 6tunnel curl
ENTRYPOINT ["python3", "/start.py"]
CMD ["python3", "/start.py"]

I stopped trying to pick the region and just set the scale to 0 and back up to 2 with max 1 per region and then deployed 2 times. I think I have some fundamental misunderstanding of how it works but the server vote seemed to have worked and 1 is leader and 1 is follower. Kind of a fragile setup though.

Obviously not setup for production and I don’t think I have 8108 actually exposed yet either. I usually use my typesense internally only anyways though.

typesense-nodes

localhost:8107:8062,localhost:8109:8063

My nodes are…communicating! :face_holding_back_tears:

It had to be a combination of the following:

  • using fly-local-6pn instead of the internal address of each machine (eg. lax.app-name.internal)
  • using 127.0.0.1 instead of localhost
  • using the -f flag on the tunnels

Not sure which of the following did the magic.

I spent 2 days on this Ian, I really want to thank you for your help. Once I get a…less hard coded version written in Go, I will document all of this for the community. Thank you SO much

1 Like

I’m glad it is working. That was definitely a whirlwind tour of various issues.

Probably important to put the 6tunnel commands under some sort of restarter like supervisor. So if they start to fail for some reason they get restarted or the whole instance goes down so it isn’t left in a partially working state because the instance status only depends on the typesense-server daemon itself as it is now.

Still seems like there should be a better way to handle the nodes file and everything but at least it works. Their docs say you need at least 3 nodes though…

A few notes:

  • I used fly-local-6pn because the region internal hostname doesn’t resolve until the server is started so it is a weird kind of chicken and egg problem (even though maybe -f overcomes this).
  • The tunnels to the other regions need -f because they might not be up yet and the hostnames will not resolve, like lax.YOUR-APP.internal.
  • I don’t know why localhost resolves to ipv6 but when you run the first 6tunnel it warns you about that “both local and remote are ipv6 addresses”. So that is why I started using the explicit ipv4 address 127.0.0.1.
1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.