Install Rack middleware and modify the ApplicationController to handle Postgres readonly errors and send back the fly-replay header. This is easy to implement in a controller, but it might be a little tricky to do as middleware.
If the gem doesn’t detect these environment variables, it should print a warning to logs and do nothing:
It would be nice to make these environment variables configurable, or at least make it easy to add config options for v2.
We think this is a ~4 hour project for experience Rails and Ruby gem developers. We’ll pay you $1,000 to build the first version for us! Post here if you’re interested, link to previous Ruby / Rails / Rack code and we’ll let you know when to go.
I’m interested in this, as I had already started testing this behavior in a Rails app. Questions:
Why do you prefer middleware to hooking into controllers for exception handling?
Why not also offer a simpler option to redirect on any non-idempotent request (PUT/PATCH/DELETE)? For most apps, this would avoid unnecessary code paths.
Here’s some ridiculously old Rails plugin code, and a simple example of a helper module I use across projects.
OK, I’ll add a config interface to be used in an initializer. I named the gem fly-rails as I think it would make sense to add other enhancements in the same place, such as a prometheus metrics exporter, rather than split it all up into smaller gems. What do you think?
Otherwise, a weakness with this approach would come up when a replica automatically gets promoted to primary, as I believe is the case with your Postgres cluster today. This is why I would add an option to send all POST/PUT/DELETE requests to the primary region, and only rely on read-only exceptions as a last resort, or not at all (IMO) by offering another out for forcing GET requests to redirect.
I do agree that having a Fly gem is a good idea. One thing we might want to add to this is simulated synchronous replication. You can actually query postgres for replica status, if the write request knows which follower a user will hit next time, it could actually wait for replication there to finish before returning. But I think this’ll need some experimentation.
Would this be to prevent something like ‘create and redirect to show’ from failing?
For this case, we could do something mentioned in another thread: force that subsequent read request to the primary region. The middleware could append a fly_region=iad param to the redirect URL, then replay any requests with this param.
That said, checking for replication status could end up being faster, if more platform-specific. But, in a high traffic app it’s not clear you could always rely on this technique.
Yes, exactly. One simple thing to start would be to set a "read from primary region until now + 5 seconds" in the session. The Gem could look for that and send a fly-replay response like it does for errors.
I do think we can come up with something to solve in the Gem, though, it’s a good thing to add to it. We’ll probably end up porting any decisions here to other frameworks.