Have a RAG chatbot to deploy in Python using Ollama w/ Gemma2 model + ChromaDB vector store. I’ve bolded my actual questions so this is skimmable. many thanks!
I’ve been watching all of Fly’s GPU-related videos on YouTube, mainly these:
To query the LLM in the app I’m not making pure HTTP requests but using the Ollama Python package. (does this matter?)
In the case of Ollama being hosted on its own internal VM, would I also need to add any volumes to it to store my model(s)?
(The first video does this.)
I’m using ChromaDB for a vector database/embeddings for the RAG functionality of the chatbot.
Would this element, like Ollama, be a case for an internal VM app setup to which the web app makes requests?
Got some excellent help including diagrams on the Chroma Discord and it seems I could choose from some options.
Option 3 seems to be what I’d prefer with my research, if I’m on the right track with a Web App + Ollama + ChromaDB network setup. But I don’t understand why the web app (user app) would need a volume if Chroma app has one… Also,
which of them would need GPU? Only the Ollama VM storing the models, that and the web app or all three?
The bottom, but with an additional GPU app that runs your LLM. Your chat app would talk to your DB and proxy requests to the Ollama server.
You might want to wrap your Ollama server with something light to terminate the machine early to save money.
Yes, that’s what I’m thinking. Would the chat app using the Python package for Chroma work for its communication with the ChromaDB Fly app?
As far as wrapping, I’m not sure… I’ve struggled with llama.cpp though I know it’s lighter.
I agree, it’s been great for me. Thanks for your help! If you know, quick final question - are process groups justified here, and where would they go? A bit stuck on that.
Thank you so much for your help @khuezy I am working on a setup aligned with that third diagram. I certainly feel you’ve provided a solution but not sure if it matters which post I mark as that…?
Also, if I can avoid process groups I may choose to, as they’re a bit confusing for me for my use case.
I’ll probably share at least some of what config I come up with in this thread, especially the VM type(s) each app gets. The LLM app (Ollama) and DB app (Chroma) I know would use volumes (but my web app also??), however I am very comfortable with Postgres and was originally testing RAG using Supabase as my vector DB before Chroma retrieval worked out a bit better.
But, would switching away from Chroma to Fly Postgres be a cost savings against using volumes in my apps? I am leaning toward no, since Fly Postgres still uses Volumes… I may experiment to find out as I am asking so many questions.
Got it, I have been a bit fuzzy on how I’ll be managing Chroma w/ RAG, and for a managed option I’d be fine going with Supabase even though it’s in alpha. I think I can wrap my head around relational DB/SQL vs. document store DB for prod.
But, will check out Turso regardless, always good to know many options!
What do you think of Redis? Not familiar with Upstash but some experience w/ Redis.
If you don’t mind, I’d like to ask you sometime about your DB setup @khuezy - slammed for time for now but wondering if there’s an opportunity to improve for me.