Configuring Python app to listen + bind to internal ports for 2 private service apps?

Update: I realize my ask is quite train-of-thought sounding, so I’m working on revising my questions to be more specific/granular. (tbh it’s the excitement that I’m quite close to launch!)

I’m working on a LLM-powered chatbot that runs on a Streamlit Python app with a private Ollama server app (internal, as flycast) and a private ChromaDB server for a vector database (same internal address type).

I’m a bit new to/not as familiar with networking config/DNS communications etc. as an app developer, so mainly I’m asking if/how the Python code is written to do this. I have the OLLAMA_HOST env variable configured as a secret in my web app that is the flycast address of the Ollama server.

But, how do I bind to 0.0.0.0:<port> additionally? (seeing here regarding working with Fly Proxy) - is this part of the Python script for the web app or am I totally off and it’s in a fly.toml config of either Ollama’s app or web app?

If helpful, I have an Ollama internal port number (11434) and its flycast DNS address that is used (unless that’s my error) via a Fly secret called by os.getenv(OLLAMA_HOST). I basically followed this but spun up as a standalone app.

I am still learning how to get it to autostart when my web app calls the host address - I have autoscaling to zero and auto starting set up, so I believe it’s to do with the web app not calling the host address properly.

Today I just broke my Ollama app trying to manually scale to 2 machines as well, as Fly is advising me to go up from 1 which I can see, though perhaps I’m just new here. Could this be because when I scaled to another machine, the command also created a new volume with the same name as the other volume? Do I need 2 machines/2 volumes? (Tangental questions I can search perhaps but open to recommendations)

Haven’t started on Chroma but it’s a somewhat parallel situaiton to Ollama. It’s a HTTP client/server setup with token header authentication (will have to complement this on the web app side) but more focused on getting at least the LLM inference going.

TLDR/easy option: Ollama port + address + Python app port(s) + address(?) talking to each other as a private backend server and a client web UI.
What are the right numbers?
What’s the/an example of Python code to write the connection?

Thanks for any help

No worries! Volumes are very non-magical here, which confuses people due to the stark contrast with the rest of the Fly.io platform, :portentous_raven:

I think I may have figured out what I’m missing. I haven’t set anything up with WireGuard for internal apps. In this page they’re using an Ollama instance as an example even. (I could revise my question or my research may just get there!)

Edit: more discovery and I don’t think I need WireGuard to configure a totally cloud based connection. I will revise my post with the answers I find vs. more questions, since it would maybe help someone else and be faster.

Have learned a lot since this first post! I know the internal network has a WireGuard mesh, but I don’t think I need to set any connections up with that until I want to ping/test my flycast app with the VPN from my machine.

I have found via logs that Streamlit is looking for the wrong port, its default 8501. I need to have it listen on 8080, say the logs. When the app boots, it does say it is on 8501.

[PC01] instance refused connection. is your app listening on 0.0.0.0:8080? make sure it is not only listening on 127.0.0.1 (hint: look at your startup logs, servers often print the address they are listening on)

I do have a question, though. To change the port for the we UI, Streamlit docs are suggesting using a reverse proxy like nginx. I am curious how that would integrate with my Fly network/Fly Proxy.

The Fly Proxy would generally take the place of Nginx, here.

Perhaps you could post the Streamlit app’s full fly.toml, so we forum readers can see where you’re at, currently?

Does it mention the binding address (typically 0.0.0.0, 127.0.0.1, or ::)?

The port discrepancy is easy to fix, but if the IP address is awry, it might take some more digging through the docs…

Does it mention the binding address (typically 0.0.0.0 , 127.0.0.1 , or :: )?

Yes, and it is now correct as 0.0.0.0.

I figured a lot out including that realization, lol. Plus, it was Streamlit that was serving the other port and where I could add 0.0.0.0 per these docs. I keep changing my fly.toml, but I could still share it. I’ve been looking at my logs each time I run fly deploy and it also seems to be restarting a lot. It may be more useful to share fly.toml alongside my streamlit config.

Right now my blocker is still that it doesn’t see OLLAMA_HOST. I had to find a Poetry plugin to load a .env environment variable but seems to not work. I would love to know how to access a Fly secret from within my Streamlit app, as this would be best to store the flycast address I believe.

Typing out forum posts really helps me answer my own questions a lot alongside asking others, and by writing I believe I need to do some fixes on the Ollama app side, so I’ll share those config files if I have a blocker again. In the meantime, thank you for all the help!

OK, currently I am fixing my Ollama flycast app as there are some issues I’m seeing from past development as I’ve learned more. To start, here’s the fly.toml:

app = "mad-sci-app-name"
primary_region = "ord"

[build]
  image = "ollama/ollama:latest"

[[mounts]]
  source = "models"
  destination = "/root/.ollama"
  initial_size = "10gb"

[http_service]
  force_https = false
  auto_stop_machines = "stop"
  auto_start_machines = true
  min_machines_running = 0
  processes = ["app"]

[[services]]
internal_port = 11434 # do I change this to the same as my web app internal port, which is 8501 for streamlit?

  [[services.ports]] # not sure about this addition...
  handlers = ["http"]
  port = 80

[[vm]]
  size = "performance-8x"
  memory = "16gb"
  gpu_kind = "l40s"

I think I got this first one, should be I can destroy it and the attached volume.
First question: I created a second machine I think incorrectly as I was only running on 1 machine. My app needs to run on GPU, and this second one that I just ran fly scale count 2 to create is a smaller CPU VM running on an Ubuntu image vs. an Ollama one. I plan to destroy it, and in this case is it fine if it’s running on only 1 GPU machine if I have autoscaling configured correctly?

answer is here
Aside about the Streamlit app: The OLLAMA_HOST value should be http://my-app-name.flycast, correct? Or should it be the numerical IP?

Second: running fly services list on the Ollama app is showing 2 services with TCP protocol but http handlers for ports 80 and 443, and a blank protocol one that’s 80 => 11434 which seems odd. Is this what my fly.toml generated at launch/deploy?

PROTOCOL	PORTS      	HANDLERS  	FORCE HTTPS	PROCESS GROUP	REGIONS	MACHINES 
TCP     	80 => 0    	[HTTP]    	False      	app          	ord    	1       	
TCP     	443 => 0   	[HTTP,TLS]	False      	app          	ord    	1       	
        	80 => 11434	[HTTP]    	False      	app          	ord    	1       

Edit: I think my best bet here is to re-create the app altogether since I know the correct toml values etc. still using the demo template as a base.

It’s up to you, but I think the fly services list anomaly is due to the redundant [[services]] block. It’s defining the same thing as [http_service], and that unfortunately tends to confuse the Fly.io infrastructure, :fish_cake:.

The original [http_service] definition from the instructions, on its own, should be sufficient. There is no relationship between the Ollama app’s internal_port and the Streamlit app’s internal_port. The latter is a client of the former, just like curl was, so it will use port 80 instead.

1 Like

I was Googling to make sure I was correct about the internal address for Chroma’s auth/CHROMA_HOST and ended up at my own forum topic about Ollama. I love the support here!