Over the weekend, we rolled out a change to the way we deploy new WireGuard peers.
Before this weekend, creating a new peer might have taken 60-90 seconds (sometimes even more). It was really painful. After this weekend, new peers should take a mostly imperceptible time to create (single-digit seconds, maybe).
If you’re a current user, and you’re mostly just using flyctl
from a terminal window on the same machine, this change shouldn’t really impact you at all: this only impacts enrollment in WireGuard, and doesn’t change the way WireGuard works for existing peers.
But if you’re using flyctl
as part of a CI process somewhere, this change might make a really big difference; your build times might be improved dramatically. That’s because CI builds often ask `flyctl to make a new WireGuard peer for every run, unless you go way out of your way to cache the WireGuard credentials. That’s not something you should ever have to do anymore.
The short story of the change is that we use Consul to propagate WireGuard configuration from our API server to our production gateway machines. Consul, for us, is relatively slow (we do some abusive things with it). Worse still is the way we pulled information from Consul to WireGuard: we used a consul-templaterb template to rewrite a wg.conf
file on our gateways, and that templating process had gotten s-l-o-w.
What we do now is run an internal service on all our gateways. The service is much more efficient about pulling information from Consul and applying it to our configuration. More importantly, it exposes a messaging API over NATS (which runs over our internal WireGuard mesh) that allows our API to send a message to enroll a new peer, and then get a real-time confirmation that the peer has been added, which is relayed through the API. Before this change, when you added a new peer, flyctl
would immediately start knocking on the doors of the gateway, asking “are you ready yet? how about now?”. Now, the API just tells you for sure that the peer’s been added, and that process takes ~1 second to run.
Again: for your existing peers, nothing should change. If you notice weirdness, let us know!
You will, if you look closely at WireGuard, probably spot weirdness unrelated to this change. That’s a whole can of worms, and we’re still working on that. We suspect that some subset of our users have connectivity issues running WireGuard directly over 51820/udp (corporate firewalls, personal firewalls, aggressive NAT, and VPNs may disrupt it). Let us know if you see that kind of weirdness too; working around these problems is an active project here.