I experience my app failing, because it is not able to connect to the database. I am able to connect locally by setting up a proxy with the fly command, so the db is there. This is an app that has been working for years and has hardly been touched. No restarts worked, so I decided to redeploy using `fly deploy`. Now the deploy fails. Not sure what I can do, seems very much like an infrastructure problem at fly, which I am i no way able to fix myself.
Firstly, on deploys - show us the logs in this thread. The deploy command should show the issue in the output text. It may also be worth showing your TOML file, so we can see if you might be having healthcheck issues.
Have you ever been able to connect to a db from your app? What kind of db do you have? Have you tried shelling into a machine in your app and seeing if you can get a console connection from there?
Socket timeout from the connection pool (HikariCP). The app has been running successfully with database connection for years. The DB is a postgresql I created when I first set up my app in fly some 4 years ago or something.
The deploy command failed, and the “old” version is now still trying to start, getting stuck in a restart loop as db connection fails.
The deploy command failed with:
Error: failed to update machine 90806250bed187: Unrecoverable error: timeout reached waiting for health checks to pass for machine 90806250bed187: failed to get VM 90806250bed187: Get "``https://api.machines.dev/v1/apps/kortglad/machines/90806250bed187``": net/http: request canceled
The private-network (6PN) egress was broken. Had to kill the machine and create a new one. Re-deploys did not work as it was re-using the machine
Ah, is that a self-hosted database in Fly infra? We had someone here the other day whose Fly Postgres database had become unreachable - I wonder if rebooting the db would help.
You may find reworking your healthcheck helpful - maybe it could ensure the web listener is working, but not the db. Then the db is something that you can repair additionally if required.
My usual cautions apply - are you running the PG database as a single node? If so, this is a bit of a risk on Fly infra. Hosts fail from time to time, taking the on-host NVMe drives with them, in a big cloud of purple smoke. The daily snapshots are OK but I would not rely on them.
If you want this to be reliable, switch to Fly’s managed db product, or swap to an external managed service provider.