Networking for RAG chatbot w/ Ollama + Chroma? ready to deploy!

Have a RAG chatbot to deploy in Python using Ollama w/ Gemma2 model + ChromaDB vector store. I’ve bolded my actual questions so this is skimmable. many thanks!

I’ve been watching all of Fly’s GPU-related videos on YouTube, mainly these:

and it’s been helpful though I’m stuck on some bits.

Is making HTTP requests to an IPv6 app in your organization/network the only way to (for example) use Ollama for a deployed + hosted web app?

To query the LLM in the app I’m not making pure HTTP requests but using the Ollama Python package. (does this matter?)

In the case of Ollama being hosted on its own internal VM, would I also need to add any volumes to it to store my model(s)?

(The first video does this.)

I’m using ChromaDB for a vector database/embeddings for the RAG functionality of the chatbot.

Would this element, like Ollama, be a case for an internal VM app setup to which the web app makes requests?

Got some excellent help including diagrams on the Chroma Discord and it seems I could choose from some options.
Option 3 seems to be what I’d prefer with my research, if I’m on the right track with a Web App + Ollama + ChromaDB network setup. But I don’t understand why the web app (user app) would need a volume if Chroma app has one… Also,

which of them would need GPU? Only the Ollama VM storing the models, that and the web app or all three?


The bottom, but with an additional GPU app that runs your LLM. Your chat app would talk to your DB and proxy requests to the Ollama server.
You might want to wrap your Ollama server with something light to terminate the machine early to save money.

Yes, that’s what I’m thinking. Would the chat app using the Python package for Chroma work for its communication with the ChromaDB Fly app?
As far as wrapping, I’m not sure… I’ve struggled with llama.cpp though I know it’s lighter.

I’m not familiar with the library but I don’t see why not. Try Ollama instead, it’s easier

I agree, it’s been great for me. Thanks for your help! If you know, quick final question - are process groups justified here, and where would they go? A bit stuck on that.

You might be able to get away with it with a targeted build step with your app and DB machines.

Thank you so much for your help @khuezy :slight_smile: I am working on a setup aligned with that third diagram. I certainly feel you’ve provided a solution but not sure if it matters which post I mark as that…?
Also, if I can avoid process groups I may choose to, as they’re a bit confusing for me for my use case.

I’ll probably share at least some of what config I come up with in this thread, especially the VM type(s) each app gets. The LLM app (Ollama) and DB app (Chroma) I know would use volumes (but my web app also??), however I am very comfortable with Postgres and was originally testing RAG using Supabase as my vector DB before Chroma retrieval worked out a bit better.

But, would switching away from Chroma to Fly Postgres be a cost savings against using volumes in my apps? I am leaning toward no, since Fly Postgres still uses Volumes… I may experiment to find out as I am asking so many questions. :sweat_smile:

I’d wait until fly launches their managed database (managing your own production grade DB is a nightmare)

I’d also recommend turso, they have vectors too and have a generous free tier.

Got it, I have been a bit fuzzy on how I’ll be managing Chroma w/ RAG, and for a managed option I’d be fine going with Supabase even though it’s in alpha. I think I can wrap my head around relational DB/SQL vs. document store DB for prod.

But, will check out Turso regardless, always good to know many options!

What do you think of Redis? Not familiar with Upstash but some experience w/ Redis.

Redis is great for a bunch of use cases but I don’t use it since I’m already getting sub millisecond queries with turso/SQLite.

If you don’t mind, I’d like to ask you sometime about your DB setup @khuezy - slammed for time for now but wondering if there’s an opportunity to improve for me.

It’s nothing special. I use a managed SQLite DB from Turso and its embedded replica feature.

I’m not smart enough to manage my own distributed DB nor do I have the time.