Hello
There has been a service incident lately (past 2-3 days), which has been resolved. However, my Rails app repeatedly gets this ActiveRecord::ConnectionNotEstablished in at_exit error.
According to HoneyBadger, there have been 10,242 occurrences in total. Here is the description:
ActiveRecord::ConnectionNotEstablished: connection to server at "MAC Address", port 5432 failed: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
What’s wrong? And how can I fix it?
Thank you!
Hi… It looks like several people’s Postgres servers may have been thrown for a loop recently, although it’s not clear which platform incidents caused which exact app-level problems,
…
What does fly m list -a db-app-name show at the moment?
Also, is Rails attempting its connection via Flycast?
This is what I get:
3 machines have been retrieved from app my-app-db.
View them in the UI here (​https://fly.io/apps/my-app-db/machines/)
my-app-db
ID NAME STATE CHECKS REGION ROLE IMAGE IP ADDRESS VOLUME CREATED LAST UPDATED PROCESS GROUP SIZE
99999999999999 empty-grass-9999 started 3/3 sea replica flyio/postgres-flex:16.4 (v0.0.62) fdaa:9:5a3f:a7b:1a9:cdfc:19f3:2 vol_vppn6z3kpp9n7x2v 2024-09-30T16:48:03Z 2024-09-30T19:52:32Z shared-cpu-2x:2048MB
99999999999999 young-meadow-9999 started 3/3 sea primary flyio/postgres-flex:16.4 (v0.0.62) fdaa:9:5a3f:a7b:1af:4acd:e052:2 vol_vg70p3ewz3gm180v 2024-09-30T16:46:40Z 2024-09-30T19:52:21Z shared-cpu-2x:2048MB
99999999999999 lively-rain-9999 started 2/3 sea zombie flyio/postgres-flex:16.4 (v0.0.62) fdaa:9:5a3f:a7b:a5:4a13:41e2:2 vol_4505n3573g950zxr 2024-09-30T16:49:24Z 2024-09-30T19:53:18Z shared-cpu-2x:2048MB
Although, I honestly don’t understand most of your answer. Sorry.
Is there a way I can fix this?
I attempted to restart the “zombie” machine and this is now what I see:
ID NAME STATE CHECKS REGION ROLE IMAGE IP ADDRESS VOLUME CREATED LAST UPDATED PROCESS GROUP SIZE
99999999999999 empty-grass-9999 started 3/3 sea replica flyio/postgres-flex:16.4 (v0.0.62) fdaa:9:5a3f:a7b:1a9:cdfc:19f3:2 vol_vppn6z3kpp9n7x2v 2024-09-30T16:48:03Z 2024-09-30T19:52:32Z shared-cpu-2x:2048MB
99999999999999 young-meadow-9999 started 3/3 sea primary flyio/postgres-flex:16.4 (v0.0.62) fdaa:9:5a3f:a7b:1af:4acd:e052:2 vol_vg70p3ewz3gm180v 2024-09-30T16:46:40Z 2024-09-30T19:52:21Z shared-cpu-2x:2048MB
99999999999999 lively-rain-9999 started 1/3 sea 500 Internal Server Error flyio/postgres-flex:16.4 (v0.0.62) fdaa:9:5a3f:a7b:a5:4a13:41e2:2 vol_4505n3573g950zxr 2024-09-30T16:49:24Z 2024-11-28T02:26:34Z shared-cpu-2x:2048MB
failed to connect to local node: failed to connect to `host=fdaa:9:5a3f:a7b:a5:4a13:41e2:2 user=repmgr database=repmgr`: server error (FATAL: the database system is starting up (SQLSTATE 57P03))
Regards
Overall, the HA clusters require a lot of manual intervention and gritty mechanics knowledge sometimes…
(I think you’ve already seen @uncvrd’s classic post, for example!)
Basically, you can either try to do steps along those lines to fix things from the inside, or take the simpler but less elegant forking approach:
https://community.fly.io/t/urgency-problems-with-postgres-the-database-is-not-responding/19926/2
(Ideally, there would be a fully managed alternative—which Fly apparently is still working on. That may end up costing ~$80/month, though. It’s rather unclear at this point…)
Thank you for your answer. I suspected I needed to use @uncvrd classic post as I’d used it before. I was hoping there would be a more “automated” solution.
Thank you nonetheless.