Update: I realize my ask is quite train-of-thought sounding, so I’m working on revising my questions to be more specific/granular. (tbh it’s the excitement that I’m quite close to launch!)
I’m working on a LLM-powered chatbot that runs on a Streamlit Python app with a private Ollama server app (internal, as flycast) and a private ChromaDB server for a vector database (same internal address type).
I’m a bit new to/not as familiar with networking config/DNS communications etc. as an app developer, so mainly I’m asking if/how the Python code is written to do this. I have the OLLAMA_HOST env variable configured as a secret in my web app that is the flycast address of the Ollama server.
But, how do I bind to 0.0.0.0:<port>
additionally? (seeing here regarding working with Fly Proxy) - is this part of the Python script for the web app or am I totally off and it’s in a fly.toml config of either Ollama’s app or web app?
If helpful, I have an Ollama internal port number (11434) and its flycast DNS address that is used (unless that’s my error) via a Fly secret called by os.getenv(OLLAMA_HOST). I basically followed this but spun up as a standalone app.
I am still learning how to get it to autostart when my web app calls the host address - I have autoscaling to zero and auto starting set up, so I believe it’s to do with the web app not calling the host address properly.
Today I just broke my Ollama app trying to manually scale to 2 machines as well, as Fly is advising me to go up from 1 which I can see, though perhaps I’m just new here. Could this be because when I scaled to another machine, the command also created a new volume with the same name as the other volume? Do I need 2 machines/2 volumes? (Tangental questions I can search perhaps but open to recommendations)
Haven’t started on Chroma but it’s a somewhat parallel situaiton to Ollama. It’s a HTTP client/server setup with token header authentication (will have to complement this on the web app side) but more focused on getting at least the LLM inference going.
TLDR/easy option: Ollama port + address + Python app port(s) + address(?) talking to each other as a private backend server and a client web UI.
What are the right numbers?
What’s the/an example of Python code to write the connection?
Thanks for any help