Delayed Rails error after ~24 hours with key_derivation_salt losing configuration

Hi folks,

I have a simple Rails app that is using the encrypts feature, which is working fine in other environments (and keeping it running on a local dev server is fine), but after some time (around 24 hours typically) anything that accesses the model that uses encrypts throws an exception:

[...snip...] ActiveRecord::Encryption::Errors::Configuration (key_derivation_salt is not configured. Please configure it via credential active_record_encryption.key_derivation_salt or by setting config.active_record.encryption.key_derivation_salt):

If I redeploy (using flyctl deploy), the error goes away and things are fine for a while longer. To ensure that things were being set, and the environment wasn’t vanishing, I use ENV variables in the config/database.yml:

  active_record_encryption:
    primary_key: <%= ENV.fetch("ACTIVERECORD_PRIMARY_KEY") %>
    deterministic_key: <%= ENV.fetch("ACTIVERECORD_DETERMINISTIC_KEY") %>
    key_derivation_salt: <%= ENV.fetch("ACTIVERECORD_KEY_SALT") %>

Each value is set, and it works without error for a while but then the next morning when I come back it’s broken again until I flyctl deploy.

What’s really peculiar is that flyctl restart <app> does not fix the issue. I have to run flyctl deploy (I’m not confident in this, as I’ve only tried flyctl restart once and it may be a red herring).

I’m a bit at a loss for what to do to debug, hoping someone can give me some insights into this. Even if I put the key values directly in config/database.yml the problem will occur.

Any help is appreciated!

Can you try running fly ssh console then env | grep SALT and see if it appears in your env?

The only guess I have here is that ActiveRecord’s encryption might not be handling database reconnects well. The proxy that runs in your postgres will drop a connection after ~30 minutes of being idle. That’s not 24 hours, though, which is weird.

You can try connecting to port 5433 on your Postgres to see if you get different behavior.

Aha, I bet that the 30 minute reconnect is the culprit here (or contributes). The daily issue is probably on my test flow, since this triggers in the auth flow and while I’m developing I’m only signing in daily. No encrypted properties outside of that, so no errors thrown!

To confirm, the env variables persist and don’t vanish (thankfully enough, that would be super weird!)

I’ll setup something that emulates a proxy that disconnects on a shorter timeout and see if I can trigger this in development mode, or a simple reproducible setup. That’s a good thing to start working from (I also hadn’t updated to Rails 7.0.4 yet, just pushed that now).

I’m also curious why a flyctl restart didn’t recover, but a flyctl deploy did, but I can save that mystery for another day.

Thanks for the help so far!