Building an open source, accessibility-focused image describer

We’re all about experimentation at Fly.io, and we want to be the easiest place for you to run GPU-powered experiments without breaking the bank or needing a PhD in your cloud provider of choice to figure things out.

As a blind screen reader user, I’ve been interested in automated image descriptions and how they can form a foundation for entirely new types of accessibility tools, but I didn’t have a good GPU or the time to figure out where to rent one. Now that Fly.io has GPUs, I made my own describer and hosted it on Fly.io. I would love to know your thoughts on it, particularly if there are ways it might be improved.

LLM-Describer is a PocketBase project which, when pointed at an Ollama instance, provides a simple authenticated API for users to submit images, get descriptions, and ask followup questions about the image. It also includes a simple Python program to put it through its paces.

I have an upcoming blog post describing this project in greater detail, but I’m not kidding when I call this an experiment. I’m new to both AI development and working with Go, so there are probably things I could do differently or ways I can improve. This is meant as a sample project, so I’m trying not to go too far into the weeds, but I’m open to any thoughts or suggestions that don’t introduce a lot of scope creep or change the project substantially.

Looking forward to reading your feedback! This has been a blast to make, and I’m glad to finally share it with folks.

3 Likes