Adding pgvector to Fly Postgres

Working with “vector” databases is extremely popular right now due to various AI and Machine Learning tasks they can help improve. Luckily the Supabase team open sourced their pgvector library to make storing and working with them built right into Postgres!

Unfortunately the default Fly Postgres Image does not include it by default. So I set out to do this:

Create a Dockerfile

FROM flyio/postgres-flex:15.2

# Install build dependencies
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
        build-essential \
        curl \
        postgresql-server-dev-all

# Set the pgvector version
ARG PGVECTOR_VERSION=0.4.1

# Download and extract the pgvector release, build the extension, and install it
RUN curl -L -o pgvector.tar.gz "https://github.com/ankane/pgvector/archive/refs/tags/v${PGVECTOR_VERSION}.tar.gz" && \
    tar -xzf pgvector.tar.gz && \
    cd "pgvector-${PGVECTOR_VERSION}" && \
    make && \
    make install

# Clean up build dependencies and temporary files
RUN apt-get remove -y build-essential curl postgresql-server-dev-all && \
    apt-get autoremove -y && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/* && \
    rm -rf /pgvector.tar.gz /pgvector-${PGVECTOR_VERSION}

Chat GPT helped me write this, crazy right?

Build and Publish that file, replacing my user/image name flyjason/fly-pg-pgvector with your own

$ docker build . -t flyjason/fly-pg-pgvector --platform "linux/amd64"
$ docker push flyjason/fly-pg-pgvector

Then lets create a Fly Postgres Instance!

$ flyctl postgres create --image-ref flyjason/fly-pg-pgvector

And thats it! You can use this same method to add any plugin you might be missing from the default Postgres setup, without missing all the nice tooling Fly provides!

21 Likes

This was failing to build for me so I switched to the newer image flyio/postgres-flex:15.3

1 Like

Thanks @jstiebs, Is there a way to update the instance? I am not seeing a flyctl postgres update command. Let’s say you update the pg version or add a different extension or configuration and push a new docker image. Can you update your existing instance?

I think you want fly image update

Is this the best strategy? It’s not possible to just create some script and attach to a docker entrypoint to continue to use default postgres image? I believe that using a different image for postgres will broke the flyctl postgres command.

1 Like

It shouldn’t. You’re still using the Fly PG image as a base image—so fly pg commands will still work. Using this method, you’d just need to manage your image/version in the Dockerfile and manually build/push, instead of fly image update.

Thanks for sharing!

But I want to add pgvector to the existed Fly pg instance. Which fly commands should I use with the Dockerfile?

I don’t think the current version of flyio/postgres-flex:15.3 is working either could someone validate it, its failing to create an instance and i ts not explicit where the failure is occuring.

2023-09-02T09:17:53Z app[6e82d04c2d3387] sin [info]2023/09/02 09:17:53 listening on [fdaa:0:4a0f:a7b:ead:68aa:124c:2]:22 (DNS: [fdaa::3]:53)
2023-09-02T09:17:54Z app[6e82d04c2d3387] sin [info][ 2.300427] reboot: Restarting system
2023-09-02T09:18:29Z app[6e82d04c2d3387] sin [info][ 0.057849] PCI: Fatal: No config space access function found
2023-09-02T09:18:29Z app[6e82d04c2d3387] sin [info] INFO Starting init (commit: 5293a085)...
2023-09-02T09:18:29Z app[6e82d04c2d3387] sin [info] INFO Mounting /dev/vdb at /data w/ uid: 0, gid: 0 and chmod 0755
2023-09-02T09:18:29Z app[6e82d04c2d3387] sin [info] INFO Resized /data to 1069547520 bytes
2023-09-02T09:18:29Z app[6e82d04c2d3387] sin [info] INFO Preparing to run: `docker-entrypoint.sh start` as root
2023-09-02T09:18:29Z app[6e82d04c2d3387] sin [info] INFO [fly api proxy] listening at /.fly/api
2023-09-02T09:18:29Z app[6e82d04c2d3387] sin [info]2023/09/02 09:18:29 listening on [fdaa:0:4a0f:a7b:ead:68aa:124c:2]:22 (DNS: [fdaa::3]:53)
2023-09-02T09:18:29Z app[6e82d04c2d3387] sin [info]Provisioning primary
2023-09-02T09:18:30Z app[6e82d04c2d3387] sin [info]panic: failed to initialize postgres failed to init postgres: exit status 1
2023-09-02T09:18:30Z app[6e82d04c2d3387] sin [info]goroutine 1 [running]:
2023-09-02T09:18:30Z app[6e82d04c2d3387] sin [info]main.panicHandler({0x9a7320?, 0xc0001cce70})
2023-09-02T09:18:30Z app[6e82d04c2d3387] sin [info] /go/src/[github.com/fly-apps/fly-postgres/cmd/start/main.go:188](http://github.com/fly-apps/fly-postgres/cmd/start/main.go:188) +0x55
2023-09-02T09:18:30Z app[6e82d04c2d3387] sin [info]main.main()
2023-09-02T09:18:30Z app[6e82d04c2d3387] sin [info] /go/src/[github.com/fly-apps/fly-postgres/cmd/start/main.go:65](http://github.com/fly-apps/fly-postgres/cmd/start/main.go:65) +0xe5e
2023-09-02T09:18:30Z app[6e82d04c2d3387] sin [info] INFO Main child exited normally with code: 2
2023-09-02T09:18:30Z app[6e82d04c2d3387] sin [info] INFO Starting clean up.
2023-09-02T09:18:30Z app[6e82d04c2d3387] sin [info] INFO Umounting /dev/vdb from /data
2023-09-02T09:18:30Z app[6e82d04c2d3387] sin [info] WARN hallpass exited, pid: 265, status: signal: 15 (SIGTERM)
2023-09-02T09:18:30Z app[6e82d04c2d3387] sin [info]2023/09/02 09:18:30 listening on [fdaa:0:4a0f:a7b:ead:68aa:124c:2]:22 (DNS: [fdaa::3]:53)
2023-09-02T09:18:31Z app[6e82d04c2d3387] sin [info][ 2.294952] reboot: Restarting system
2023-09-02T09:18:32Z health[6e82d04c2d3387] sin [error]Health check for your postgres role has failed. Your cluster's membership is inconsistent.

2023-09-02T09:18:50Z health[6e82d04c2d3387] sin [error]Health check for your postgres role has failed. Your cluster's membership is inconsistent.

For anyone experiencing a similar issue it could be an issue with the postgres image built by your local Docker platform try to use the remote builder in Fly

The steps below are courtesy of kylemclaren.

Create a new app: fly create my-app, then using your prefered Dockerfile run fly deploy --build-only --push --remote-only --dockerfile Dockerfile --app my-app

Then using the built image create the new postgres app using fly create pg --image-ref registry.fly.io/myapp:deployment...

1 Like

I am getting the same error as @dharmaraja even after I followed the steps mentioned in Adding pgvector to Fly Postgres - #9 by dharmaraja

I am getting this same error when I followed the original steps of this post, made my own docker image and tried to spin up a postgres app with that image:

fly postgres create --image-ref justinhenricks/fly-pg-pgvector

Anyone know of a good way to get a postgres with pg-vector db up and running on fly?

The instructions should work, are you using flyio/postgres-flex:15.3 as your base image in your Dockerfile?

I am, yes.

Ok, it’s probably something simple, so you should definitely post some more details about all the steps you tried, and where things failed.

Hry Kyle - I posted a pretty in depth run down of exactly what I tried here: How-To Create Fly Postgres DB with pgvector

For anyone still wrestling with this I was able to build a Postgres 15 docker image with PGVector v0.5 and push that up to docker hub

The dockerfile is above and I’ve got it deployed/ working on fly.io

1 Like

Heads up - this documentation on fly.io has some out of date information.

The Fly Postgres app is fully open source. Just fork fly-apps/postgres-ha and add whatever meets your needs. You can even update an existing cluster with your new image using fly deploy --image. One caveat is that once you fork, you won’t be able to use fly postgres commands to administer your app.

fly postgres commands do indeed work with forked images. And, that git repo should likely point to fly-apps/postgres-flex: Postgres HA setup using repmgr (github.com) instead of fly-apps/postgres-ha: Postgres + Stolon for HA clusters as Fly apps. (github.com). And I wonder if the postgres-ha repo should get a deprecation notice. I spent some unnecessary time following those instructions, not realizing there was a new image.

1 Like

In case someone else has encountered this, @jstiebs’s solution worked for me!

Thank you so much @jstiebs <3!

Another update here. I would recommend building the docker image using the Dockerfile from GitHub - pgvector/pgvector: Open-source vector similarity search for Postgres as it contains lots of little details to make the image size-optimized.

git clone --branch v0.6.0 https://github.com/pgvector/pgvector.git

Then edit the Dockerfile, changing the top from:

ARG PG_MAJOR=16
FROM postgres:$PG_MAJOR

ARG PG_MAJOR=15
FROM flyio/postgres-flex:15.3

(postgres-flex latest is 15.3)

Then simply:

docker build -t USERNAME/fly-pgvector . --platform "linux/amd64"
docker push USERNAME/fly-pgvector

fly postgres create --image-ref USERNAME/fly-pgvector

I also have a version that I pushed if anyone wants to use it: jkonowitch/fly-pgvector - Docker Image

1 Like

I have taken the solution proposed by @jkonowitch and created a repo with these changes and instructions how to incorporate it into a GitHub action. Note I still would like to build a route for people to install via an image, though I think it makes it harder to pull in the latest Pgvector and Fly Postgres changes because I’m not aware of a flow to update a Fly Postgres app with a Docker image from the hub after it’s been initially created.