Database connectivity issues

Ive had a database connected to my app for months, but now suddenly the app is unable to connect. I also tried proxying to it with the flyctl, but it cant find the host name.

My dashboard is green, and so is the fly status pages. So I am a bit lost. My app has been unchanged for 1 month, and has been working with no problems until I discovered this today.

Hey there,

Looks like this was on a host that crashed and was restarted on the 9th. It started back up at 4:32 PM and your machine was restarted a few minutes after that.

Seems like there was a bug and the private IP addresses were not properly registered in our state.

I stop/started your machine and it is back and DNS responses seem :+1:. This enabled the app connecting to the database to start back up too.

We’re going to look through what might’ve happened and implement a more permanent fix.

2 Likes

Great, thanks! Any way I could have detected this? Either automatically (from the app) or manually? And how could I have fixed it myself (automatically)?

Hi, I am having the same issue! I tried to fly apps restart --force-stop but still getting :nxdomain error.

The temporary fix is to restart your database’s machine, not the app.

I thought I fixed all instances yesterday, but my query for detecting them was wrong.

We don’t yet have alerting features like that, though we are working on something.

You couldn’t have detected this automatically or manually by directly observing your database instances. Indirectly, your app connecting to the database on boot didn’t work. External monitoring services might’ve alerted you about this earlier (I’m just throwing ideas, not saying you should’ve had that).

We’re looking into the sequence of events that led to this situation. It seemed correlated with the host rebooting.

Restarting the database instance (the machine, via fly machine restart, not via fly pg restart) would’ve fixed this. That’s how I fixed it. We have a better way to fix it without a restart now in case it happens again. Ideally it won’t, that’s what we’re working towards.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.