Networking for RAG chatbot w/ Ollama + Chroma? ready to deploy!

madsciai · November 30, 2024, 5:08am

Have a RAG chatbot to deploy in Python using Ollama w/ Gemma2 model + ChromaDB vector store. I’ve bolded my actual questions so this is skimmable. many thanks!

I’ve been watching all of Fly’s GPU-related videos on YouTube, mainly these:

and it’s been helpful though I’m stuck on some bits.

Is making HTTP requests to an IPv6 app in your organization/network the only way to (for example) use Ollama for a deployed + hosted web app?

To query the LLM in the app I’m not making pure HTTP requests but using the Ollama Python package. (does this matter?)

In the case of Ollama being hosted on its own internal VM, would I also need to add any volumes to it to store my model(s)?

(The first video does this.)

I’m using ChromaDB for a vector database/embeddings for the RAG functionality of the chatbot.

Would this element, like Ollama, be a case for an internal VM app setup to which the web app makes requests?

Got some excellent help including diagrams on the Chroma Discord and it seems I could choose from some options.
Option 3 seems to be what I’d prefer with my research, if I’m on the right track with a Web App + Ollama + ChromaDB network setup. But I don’t understand why the web app (user app) would need a volume if Chroma app has one… Also,

which of them would need GPU? Only the Ollama VM storing the models, that and the web app or all three?

khuezy · November 30, 2024, 7:02am

The bottom, but with an additional GPU app that runs your LLM. Your chat app would talk to your DB and proxy requests to the Ollama server.
You might want to wrap your Ollama server with something light to terminate the machine early to save money.

madsciai · November 30, 2024, 8:27am

Yes, that’s what I’m thinking. Would the chat app using the Python package for Chroma work for its communication with the ChromaDB Fly app?
As far as wrapping, I’m not sure… I’ve struggled with llama.cpp though I know it’s lighter.

khuezy · November 30, 2024, 8:49am

I’m not familiar with the library but I don’t see why not. Try Ollama instead, it’s easier

madsciai · November 30, 2024, 8:51am

I agree, it’s been great for me. Thanks for your help! If you know, quick final question - are process groups justified here, and where would they go? A bit stuck on that.

khuezy · November 30, 2024, 9:11am

You might be able to get away with it with a targeted build step with your app and DB machines.

madsciai · November 30, 2024, 5:01pm

Thank you so much for your help @khuezy I am working on a setup aligned with that third diagram. I certainly feel you’ve provided a solution but not sure if it matters which post I mark as that…?
Also, if I can avoid process groups I may choose to, as they’re a bit confusing for me for my use case.

I’ll probably share at least some of what config I come up with in this thread, especially the VM type(s) each app gets. The LLM app (Ollama) and DB app (Chroma) I know would use volumes (but my web app also??), however I am very comfortable with Postgres and was originally testing RAG using Supabase as my vector DB before Chroma retrieval worked out a bit better.

But, would switching away from Chroma to Fly Postgres be a cost savings against using volumes in my apps? I am leaning toward no, since Fly Postgres still uses Volumes… I may experiment to find out as I am asking so many questions.

khuezy · November 30, 2024, 10:19pm

I’d wait until fly launches their managed database (managing your own production grade DB is a nightmare)

I’d also recommend turso, they have vectors too and have a generous free tier.

madsciai · November 30, 2024, 10:24pm

Got it, I have been a bit fuzzy on how I’ll be managing Chroma w/ RAG, and for a managed option I’d be fine going with Supabase even though it’s in alpha. I think I can wrap my head around relational DB/SQL vs. document store DB for prod.

But, will check out Turso regardless, always good to know many options!

What do you think of Redis? Not familiar with Upstash but some experience w/ Redis.

khuezy · December 1, 2024, 1:41am

Redis is great for a bunch of use cases but I don’t use it since I’m already getting sub millisecond queries with turso/SQLite.

madsciai · December 8, 2024, 11:37pm

If you don’t mind, I’d like to ask you sometime about your DB setup @khuezy - slammed for time for now but wondering if there’s an opportunity to improve for me.

khuezy · December 9, 2024, 1:09am

It’s nothing special. I use a managed SQLite DB from Turso and its embedded replica feature.

I’m not smart enough to manage my own distributed DB nor do I have the time.

Topic		Replies	Views
3rd party vector store (ChromaDB) support w/ Fly? Questions / Help databases , vector	1	57	November 26, 2024
Testing Ollama and L40S gpu , autoscaling	5	319	September 15, 2024
Configuring Python app to listen + bind to internal ports for 2 private service apps? Python volumes , proxy	8	162	January 16, 2025
Run private applications with Flycast Fresh Produce	14	1562	December 30, 2024
Error: Head "...": EOF when running Ollama commands Questions / Help gpu	3	1477	August 15, 2024

Networking for RAG chatbot w/ Ollama + Chroma? ready to deploy!

Related topics