Unrequested postgres upgrade

Fly staff should never take action against a user’s environment without it being specifically requested or approved. I’ve seen responses to other threads that made me think this was happening, but had hoped that some authorization was done via DM or outside channel. However, this just occurred on one of my instances - in this case, a Postgres cluster was redeployed after I had asked how I would go about upgrading it.

I’m not going to fault the individual in this case, because this seems to normal practice. My request is that this change. In this case, it was against a staging environment for an app that hasn’t launched yet, so it didn’t cause any harm. However, I plan on running production traffic soon, and it seems many others are already doing so. It’s not okay for changes to occur to our environments without explicit permission.

Obviously there are exceptions in the case of the health of the overall infrastructure or repairing a fault that has brought a service offline. However, in all other cases it should be worth the time to ask if action can be taken against a user’s environment by fly staff before it occurs.

Can this please be the policy moving forward? If not, I’d like to know and I will be moving my services off of Fly before launch.

There are two things to be aware of here:

  1. Postgres clusters are pseudo managed, we upgrade them quite frequently. You can deploy a postgres cluster manually that we won’t treat as part of the “postgres fleet”, if you’d like, and we won’t perform any automated maintenance on it.
  2. Apps on Fly are not like traditional VMs and will stop/start/migrate frequently as part of automated system changes. These should always be zero downtime (unless we have an actual outage), but you should expect individual app VMs to be disposable.

These are things we should document, and it’s good you opened this discussion. Hopefully the way we run these works for you and you won’t need to migrate!

By the way, instructions for manually deploying a postgres are here: GitHub - fly-apps/postgres-ha: Postgres + Stolon for HA clusters as Fly apps.

We can also just update your existing one in our DB if you’d like.

I understand that the VMs are not guaranteed to be long lived, that’s pretty common with hosted services. Typically that means that changes are done to underlying hardware to handle things like load balancing, hardware upgrades, etc… However, direct changes to a user’s environment should still require some sort of permission or at least notification. In this case, I found out it was happening because alerts were thrown and I went in to check my logs and saw the deployment had occurred.

As for postgres being pseudo managed, I think that needs to be defined. The way it is documented today, it seems like the images that are used are managed by Fly and there is some nice automation around a postgres deployment, but the instances themselves are still our responsibility to maintain. In the posted documentation ( Multi-region PostgreSQL · Fly) things like scaling, region assignment, IP addresses, and even deployment of new images are documented as a user task. To me, this isn’t a managed service. And if it is, why was an upgrade from version that was known to have alerting issues not done until I asked questions about it?

Beyond postgres, is there a policy in place that a user’s environment will not be modified without their permission unless it is for something that is directly affecting the health of the Fly infrastructure or stopping something that is in violation of an agreement (such as hosting of illegal content)?

Yeah, Postgres clusters provisioned with fly pg create are a special case. We don’t touch other kinds of apps without communicating with people ahead of time.

In the case of your postgres, we fixed a bunch of alerting ones all at the same time. I’m guessing your post happened about when we noticed issues. I totally understand not wanting your DB touched though! We have a way to do that, just let us know.