Run private applications with Flycast

TheXe · June 17, 2024, 4:36pm

Last week I published Jack into your private network with WireGuard, which showed you how to connect to your organization’s private network. This week I’m gonna show off Flycast.

Take a look! (the thumbnail is a link to the YouTube video)

Intro

A lot of the time your applications are made to be public and shared with the world. Sometimes you need to be more careful. When you deploy your apps on Fly.io, you get a private network for your organization. This lets your applications in other continents contact eachother like they were in the same room.

Sometimes you need a middle ground between fully public apps and fully private apps, and Flycast is there for when you need it. Flycast addresses are private but global IPv6 addresses inside your private network that go through the Fly Proxy, so you get all of the load management and machine waking powers that you get for free with public apps.

Today I’ll cover what Flycast is, when and why you’d want to use it, and show you how to create an instance of Ollama that you can connect to over Flycast.

What is Flycast?

Before we get started, let’s talk about Flycast and when you would want to use it. In general we can split every kind of Fly App into two categories: public apps and private apps.

A public app is what you’d expose to the public Internet for your users. These are usually hardened apps that allow users to do some things, but have access limitations that prevent them from stepping outside their bounds. These are mostly programs that listen over HTTP for browsers to interact with. Your users connect to a public app through the platform router via the .fly.dev domain or whatever other domain you’ve set up.

A private app is something internal, like a database server or a worker queue. These are things that run in the background and help you get things done, but are intentionally designed to NOT be exposed to the public Internet. You wouldn’t want to expose your Postgres or Valkey servers to anyone, would you?

However, with a fully private app, all connections go directly to the Machines via their .internal addresses, so you have to keep them running 24/7 to maintain connectivity. This is fine for services like database engines where you want them to be running all the time, but what about an admin panel? You want your admin panel to be separate from your main app so that users can ever get into it, even by accident, but you also want it to shut down when it’s not in use.

Flycast exists for this middle category of apps. With Flycast, your apps are only visible over your organization’s private network, but any traffic to them goes through the proxy so they can turn on when you need them and turn off when you don’t. This allows your administrative panels to be physically separate so that users can’t access them.

When you want to connect to an app via Flycast, you connect to appname.flycast.

Security note

Just a heads-up. In general, it’s a bad idea to assume that network access barriers like Flycast or NAT are security layers. At best, this is an obfuscation layer that makes it more difficult for attackers to get into private applications. Flycast is not a replacement for authentication in your private applications. With Flycast, you don’t know who a request is coming from, but you do know that it’s coming from something or someone in your private network.

One of the biggest platform features that uses Flycast out of the box is Fly Postgres. Even though Flycast addresses are local to your private network, Fly Postgres still configures usernames and passwords for your database.

Goal

Today we’re gonna show Flycast off by setting up an instance of Ollama.

Ollama is a program that wraps large language models and gives you an interface like Docker so that you can run open-weights large language models privately on your own device. Large language models are computationally expensive to run, so being able to offload them to a GPU-powered Fly Machine means you can hack all you want without burning up your precious battery life.

Ollama doesn’t ship with authentication by default. When you create an instance of Ollama, anyone can access it without entering in a username, password, or API key. This is fine for running your models on your own computer; but it means that if you expose it to the internet, anyone can use it and run models whenever they want. This could rack up your bill infinitely.

This is where Flycast comes in. Flycast lets you run a copy of Ollama on your private network so that you and your apps can access it, but nobody else. Flycast also lets you have the platform turn off your Ollama server when you’re not using it, which will save you money. This fits into that middle ground case that Flycast covers perfectly.

Prerequisites

In order to get started, you need to have the following:

A fly.io account
Flyctl installed (https://fly.io/docs/flyctl/install/)

The links are in the description.

If you want to interact with your Flycast apps from your computer, like an Ollama instance, you’ll need to jack into your private network with WireGuard. The link for how to do that is in the description.

Steps

Create a new folder on your computer called ollama. This is where we’ll put the Ollama configuration. Open a terminal in that folder and run the fly launch command:

fly launch --from https://github.com/fly-apps/ollama-demo --no-deploy

This command creates a new fly app from the ollama-demo template and tells the flyctl command to not deploy it after you create the app. If we don’t do this, then the platform will create public IPv4 and IPv6 addresses, which will make this a public app. The name you choose when you create your app will be used to connect to your app over Flycast.

Next, allocate a Flycast address for your app with the fly ips allocate-v6 command:

$ fly ips allocate-v6 --private

Now you can deploy the app with the fly deploy command:

$ fly deploy

After that finishes, you can see the list of IP addresses associated to an app with fly ips list:

$ fly ips list
VERSION	IP                	TYPE   	REGION	CREATED AT
v6     	fdaa:3:9018:0:1::7	private	global	23h12m ago

Learn more about Fly.io public, private, shared and dedicated IP addresses in our docs: https://fly.io/docs/reference/services/#ip-addresses

This app only has one IP address: a private Flycast IPv6 address. If had public IP addresses, it’d look like this:

$ fly ips list -a recipeficator
VERSION	IP                    	TYPE              	REGION	CREATED AT
v6     	2a09:8280:1::37:7312:0	public (dedicated)	global	May 30 2024 13:51
v4     	66.241.124.113        	public (shared)   	      	Jan 1 0001 00:00

Now that we’ve proven it’s private, let’s open an interactive shell machine to play around with Flycast. Create the shell machine with fly machine run:

$ fly machine run --shell ubuntu
root@e784127b51e083:/#

The Ubuntu image we chose is very minimal, so we need to install a few tools such as ping, curl, and dig:

# apt update && apt install -y curl iputils-ping dnsutils

My app is named xe-ollama, so let’s look up its .flycast address with nslookup xe-ollama.flycast:

# nslookup xe-ollama.flycast
Server:		fdaa::3
Address:	fdaa::3#53

Name:	xe-ollama.flycast
Address: fdaa:3:9018:0:1::7

Awesome, it matches that IP address from earlier! Now let’s see what happens when we ping it:

# ping xe-ollama.flycast -c2
PING xe-ollama.flycast (fdaa:3:9018:0:1::7) 56 data bytes
64 bytes from fdaa:3:9018:0:1::7: icmp_seq=1 ttl=63 time=0.138 ms
64 bytes from fdaa:3:9018:0:1::7: icmp_seq=2 ttl=63 time=0.223 ms

--- xe-ollama.flycast ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1009ms
rtt min/avg/max/mdev = 0.138/0.180/0.223/0.042 ms

Perfect, now let’s make a request to the Ollama app with curl:

# curl http://xe-ollama.flycast

It took a moment for Ollama to spin up, and now we get a happy “Ollama is running” message. Wait a few moments so your Ollama app goes to sleep and run the time command to see how long the first request takes:

# time curl http://xe-ollama.flycast
Ollama is running
real	0m9.144s
user	0m0.003s
sys		0m0.003s

It took a few seconds for the platform to wake up Ollama and make sure it was ready for your requests. The next request is a lot faster:

# time curl http://xe-ollama.flycast
Ollama is running
real	0m0.043s
user	0m0.003s
sys		0m0.003s

And if you wait a few moments, it’ll spin back down.

Llama 3 example

Now that we’ve set up Ollama and demonstrated the platform turning it off and on for you, let’s run Llama 3. Exit out of that shell machine with control-D so we can make a new one with the Ollama client installed.

Create an Ollama shell using fly machine run:

$ fly machine run --shell ollama/ollama

Once that starts up, point the Ollama client to your Flycast app by setting the OLLAMA_HOST environment variable:

# export OLLAMA_HOST=http://xe-ollama.flycast

Then you can ask Llama 3 anything you want:

# ollama run llama3 "Why is the sky blue?"

It took a moment for Ollama to get ready and download the image, then it downloaded it and answered your question. Once it’s been idle for a moment, the platform will turn Ollama back off.

Conclusion

And there we go! We’ve covered what Flycast is, why you’d want to use it, and set up an instance of Ollama to show it off.

You can use Flycast with any application that listens over HTTP or TCP without any modification to your code. UDP is a bit more tricky due to the fact that there’s no sessions, but there’s documentation you can follow to do this. The link is in the description.

I hope this helped you learn more about the platform and the cool hacks you can pull off on it. If you have any questions or want me to cover anything else in the future, please leave a comment in the box down below. If you’ve created something cool with Flycast, also leave a comment or shout us out on Twitter at @flydotio.

Have a good day everyone!

charsleysa · June 17, 2024, 5:24pm

If you only use a one or a handful of models, you can preinstall the models and avoid needing to provision volumes.

Here’s a Dockerfile for preinstalling the qwen2 model:

FROM ollama/ollama:0.1.44

RUN ollama serve & sleep 5 && ollama pull qwen2 && kill $!

It can result in slower first boot times during deploys for regions outside of the US as the image transfer times for large images are still quite slow, but after the first boot it starts up and is ready to go in seconds.

hbagdi · June 18, 2024, 10:48pm

How would DNS work in this case if the private IP is allocated in a different network (via the --network flag)?

kylemclaren · June 19, 2024, 12:05pm

Oh good question! The custom network stuff is not well documented. You can expose an app outside its custom network through Flycast. You’ll want to allocate an IPv6 on the target app, specifying the network you want it to be accessible from

fly ips allocate-v6 --private --network <custom_network_name> --app <target_app>

If you want it to be accessible from your org’s default network, you can just omit --network.

charsleysa · June 19, 2024, 12:30pm

That’s so cool! This pretty much allows for nested networks with flycast acting as the bastion.

hbagdi · June 19, 2024, 4:33pm

What is the DNS record that the target app (in the custom_network_name) can use? In other words, which DNS record will resolve to the IP address we just allocated?

kylemclaren · June 19, 2024, 7:19pm

<app_name>.flycast

hbagdi · June 19, 2024, 8:27pm

Fantastic. So, it works across networks just like within a network. One final question: is there any way to expose private apps to other apps without going via the proxy in the middle?

tmaier · September 5, 2024, 9:44pm

@TheXe this is cool!

I promise my customers that I encrypt all the traffic in-transit. Now I see that this works with HTTP.
How about HTTPS? I suppose there is no automatic SSL certificate set up for this domain, right?

Besides HTTPS, I actually wonder if this is necessary at all, since there is the Wireguard mesh, as I read in Transport security between proxy and fly machine

charsleysa · September 5, 2024, 10:07pm

@tmaier HTTPS isn’t needed if it all stays within fly as traffic is already encrypted during transit because of the Wireguard mesh.

madsciai · December 29, 2024, 6:09am

Thank you for these resources @TheXe. I’m watching/reading and two questions to ask anybody in the community.

Am I correct that a WireGuard jack-in would not be necessary if I’m not intending to ping my Ollama server on Flycast from my local machine for now? I’m configuring the connection between my public Streamlit web app and Flycast Ollama server app.

I could easily end up doing it for further testing and development, but I don’t see a need otherwise. Have not read about Wireguard Mesh yet though, and I will to see if this would help in my case.

I’m meticulous about the small number of models I want to be available, and like all the other config stuff in the Dockerfile and toml pretty much. Would it make a difference if I don’t clone the original repo as the base for my own app and write it out myself with what I want to customize? (I think this is a no-brainer, it would, and would confuse me less to do so somehow. I’m welcome to corrections!)

I want to use the multi-model launch Dockerfile idea that @charsleysa posted. However, how does that mesh with the one in the template repo? I’m looking at it, the other Dockerfile and the server script and not as sure where/what I would test to find out as I thought.

(If this should be in another topic like the help section, I can move it)

mayailurus · December 30, 2024, 4:36am

Right, the instructions mentioned in your other thread should have resulted in your Ollama app having both a Flycast address and an [http_service] block in its fly.toml, which should be enough to make it reachable from all your other Fly.io machines.

Try the curl test that @TheXe (wisely) included. (You should use the default port 80 here, not the more intuitive internal_port.)

Hope this helps a little!

Note: Commands preceded by an octothorpe/number-sign (#) prompt in the first column execute within the temporary machine that fly machine run created. Thus, they have access to the internal network—without additional fiddling.

Note²: Also, obviously replace xe-ollama with your own Ollama name, .

madsciai · December 30, 2024, 5:02am

update: curl test successful! if getting the real/user/sys stats returned like in the video is what’s intended. However, I forgot to change the port to 80/remove internal_port, and it still worked? (might need to update/re-deploy as it still worked when I removed that line entirely from the toml file and it isn’t detecting the change)

thank you for the clarifications here! you can see I’m all over the place learning a lot. since I switched to [services] in fly.toml, but I verified I do have that block per the instructions you linked. However, it does have in it internal_port = 11434.

When you say use the default port 80, is this for the [http_service] spec in the Ollama app, parallel to the curl command you posted? (I’m multitasking on docs too much and discussing on Ollama Discord at the moment so this could be me missing something.)

When I fix the http service section and re-deploy, would this be how my web app (on a .dev site at the moment) would activate it/wake it up with a user ping?

What’s happening once I run the first fly machine command in this test? It’s pulling an Ubuntu base image and installing ip/dns packages. Is this for curl pings specifically? Just curious.

thank you!

mayailurus · December 30, 2024, 5:57am

Those are from the time command (which is a good utility to know, in general).

@TheXe’s post also shows Ollama is running, which is the more critical thing to look for here…

Port 80 was for curl, not for fly.toml, actually. Sometimes people attempt xe-ollama.flycast:11434, reasoning that internal access should use the internal_port.

(Since the Fly Proxy sits between the client and the server, there are two ports involved.)

The command at the dollar-sign ($) prompt should work from any platform that flyctl supports. The commands after that will then be within Ubuntu, though…

madsciai · December 30, 2024, 6:19am

Thank you!! I was just about to ask about if port 80 was only for curl. And, no issue running the commands inside the machine shell w/ ubuntu, no errors. I am making more progress with this. Not sure on if I’ll run into anything re: Ubuntu, but doubtful as I’m trying to have everything in the cloud.

I accidentally had Streamlit’s local port listed as an internal_port in my web app toml, so removed that and re-deployed.

Technically I’m still a bit puzzled on what it truly means to “bind your app to 0.0.0.0:” or basically in what part of my system I would do this. I’ll get there.