Deploying a weaviate cluster...is this ip lookup workaround ok?

tldr; Is removing the default private IPv4 and IPv6 addresses from eth0 going to cause any problems?


I’ve been working on deploying a weaviate cluster on fly.io, using just the 6pn network. I’m telling weaviate to bind to fly-local-6pn and that works fine as far as the main HTTP endpoint is concerned. The cluster wasn’t starting up though. It turns out that weaviate uses hashicorp’s memberlist to drive cluster elections. As part of connecting the cluster, memberlist gets the IP of the host and broadcasts that to the other nodes. There’s no way to configure the address it uses through weaviate at the moment, so by default it just takes the first IP assigned to network interface…which in this case is the private IPv4 address of the VM.

This of course doesn’t work because that IP isn’t routable. So my solution is to delete those addresses from the network interface before starting up weaviate:

ADDR=$(cat /etc/hosts | grep $(cat /etc/hostname) | sed 2d | awk '{print $1}')
ip addr del $ADDR dev eth0

ADDR=$(cat /etc/hosts | grep $(cat /etc/hostname) | sed 1d | awk '{print $1}')
ip addr del $(ip address | grep $ADDR | awk '{print $2}') dev eth0

This works. Each node in the cluster now broadcasts their (routable) 6pn address. But it’s a bit of a the hack, so I wanted to ask if this is going to screw anything up?

I could setup an IP namespace and do all this within that namespace to isolate the change to just the weaviate process if I had to.

Thanks!

1 Like

OMG I’ve been trying to figure this out for months, practically gave up. I don’t have a better way to do it, but thank you for sharing this workaround!

For sure! I do have a better workaround now though. Dropping those addresses kills outbound traffic, which means you can’t use any API-based modules (among other things). So this is what I’m doing now:

start-weaviate.sh:

#!/bin/sh

hostname=$(cat /etc/hostname)
ip4=$(cat /etc/hosts | grep $hostname | sed 2d | awk '{print $1}' )
ip4addr=$(ip addr show dev eth0 | grep $ip4 | awk '{print $2}' )
ip4brd=$(ip addr show dev eth0 | grep $ip4 | awk '{print $4}' )
ip6=$(cat /etc/hosts | grep $hostname | sed 1d | awk '{print $1}' )
ip6addr=$(ip addr show dev eth0 | grep $ip6 | awk '{print $2}' )
gw=$(route | grep default | awk '{print $2}')

[ "$1" == "node" ] && export CLUSTER_JOIN=primary.process.bidify-weaviate.internal:7100

(
  sleep 15;
  ip addr add $ip4addr brd $ip4brd dev eth0;
  ip addr add $ip6addr dev eth0;
  route add default gw $gw eth0
) &

exec /bin/weaviate --host :: --port 8080 --scheme http

Dockerfile:

FROM semitechnologies/weaviate:1.21.2
COPY --chmod=744 start-weaviate.sh /bin/start-weaviate
ENTRYPOINT []
CMD ["/bin/start-weaviate"]

Then in fly.toml:

[processes]
  primary = "/bin/start-weaviate"
  node = "/bin/start-weaviate node"

So now the ips and the default route get added back 15 seconds after starting up weaviate.

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.