Failing Postgres DB connection from app here and there

Hello,

my app sometimes complains about not being able to connect to the database. Sometimes that happens for several minutes. Sometimes just for a moment. Now I was finally able to check logs when that was happening:

2021-10-03T09:09:41.265270079Z app[452e7fe8] fra [info] keeper   | 2021-10-03T09:09:41.264Z	INFO	cmd/keeper.go:1505	our db requested role is master
2021-10-03T09:09:41.265942145Z app[452e7fe8] fra [info] keeper   | 2021-10-03T09:09:41.265Z	INFO	cmd/keeper.go:1543	already master
2021-10-03T09:09:41.281351552Z app[452e7fe8] fra [info] keeper   | 2021-10-03T09:09:41.281Z	INFO	cmd/keeper.go:1676	postgres parameters not changed
2021-10-03T09:09:41.281509352Z app[452e7fe8] fra [info] keeper   | 2021-10-03T09:09:41.281Z	INFO	cmd/keeper.go:1703	postgres hba entries not changed
2021-10-03T09:09:42.180254311Z app[452e7fe8] fra [info] proxy    | 2021-10-03T09:09:42.179Z	INFO	cmd/proxy.go:286	proxying to master address	{"address": "[fdaa:0:2d8e:a7b:67:0:2457:2]:5433"}
2021-10-03T09:09:42.841680568Z app[452e7fe8] fra [info] sentinel | 2021-10-03T09:09:42.841Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "961135c1", "keeper": "66024582"}
2021-10-03T09:09:46.418738243Z app[452e7fe8] fra [info] keeper   | 2021-10-03T09:09:46.418Z	INFO	cmd/keeper.go:1505	our db requested role is master
2021-10-03T09:09:46.419636788Z app[452e7fe8] fra [info] keeper   | 2021-10-03T09:09:46.419Z	INFO	cmd/keeper.go:1543	already master
2021-10-03T09:09:46.436718075Z app[452e7fe8] fra [info] keeper   | 2021-10-03T09:09:46.436Z	INFO	cmd/keeper.go:1676	postgres parameters not changed
2021-10-03T09:09:46.436913844Z app[452e7fe8] fra [info] keeper   | 2021-10-03T09:09:46.436Z	INFO	cmd/keeper.go:1703	postgres hba entries not changed
2021-10-03T09:09:47.540306494Z app[452e7fe8] fra [info] proxy    | 2021-10-03T09:09:47.539Z	INFO	cmd/proxy.go:268	master address	{"address": "[fdaa:0:2d8e:a7b:67:0:2457:2]:5433"}
2021-10-03T09:09:50.288729983Z app[452e7fe8] fra [info] proxy    | 2021-10-03T09:09:50.288Z	INFO	cmd/proxy.go:286	proxying to master address	{"address": "[fdaa:0:2d8e:a7b:67:0:2457:2]:5433"}
2021-10-03T09:09:51.051800039Z app[452e7fe8] fra [info] sentinel | 2021-10-03T09:09:51.051Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "961135c1", "keeper": "66024582"}
2021-10-03T09:09:51.576782814Z app[452e7fe8] fra [info] keeper   | 2021-10-03T09:09:51.576Z	INFO	cmd/keeper.go:1543	already master
2021-10-03T09:09:51.597182051Z app[452e7fe8] fra [info] keeper   | 2021-10-03T09:09:51.596Z	INFO	cmd/keeper.go:1676	postgres parameters not changed
2021-10-03T09:09:51.597394720Z app[452e7fe8] fra [info] keeper   | 2021-10-03T09:09:51.597Z	INFO	cmd/keeper.go:1703	postgres hba entries not changed
2021-10-03T09:09:56.787866459Z app[452e7fe8] fra [info] keeper   | 2021-10-03T09:09:56.787Z	INFO	cmd/keeper.go:1505	our db requested role is master
2021-10-03T09:09:56.789484909Z app[452e7fe8] fra [info] keeper   | 2021-10-03T09:09:56.789Z	INFO	cmd/keeper.go:1543	already master
2021-10-03T09:09:56.810319764Z app[452e7fe8] fra [info] keeper   | 2021-10-03T09:09:56.810Z	INFO	cmd/keeper.go:1676	postgres parameters not changed
2021-10-03T09:09:56.810576553Z app[452e7fe8] fra [info] keeper   | 2021-10-03T09:09:56.810Z	INFO	cmd/keeper.go:1703	postgres hba entries not changed
2021-10-03T09:10:01.955190630Z app[452e7fe8] fra [info] keeper   | 2021-10-03T09:10:01.954Z	INFO	cmd/keeper.go:1505	our db requested role is master
2021-10-03T09:10:01.956007235Z app[452e7fe8] fra [info] keeper   | 2021-10-03T09:10:01.955Z	INFO	cmd/keeper.go:1543	already master
2021-10-03T09:10:01.972511096Z app[452e7fe8] fra [info] keeper   | 2021-10-03T09:10:01.972Z	INFO	cmd/keeper.go:1676	postgres parameters not changed
2021-10-03T09:10:01.972692625Z app[452e7fe8] fra [info] keeper   | 2021-10-03T09:10:01.972Z	INFO	cmd/keeper.go:1703	postgres hba entries not changed
2021-10-03T09:10:05.298001322Z app[452e7fe8] fra [info] proxy    | 2021-10-03T09:10:05.289Z	INFO	cmd/proxy.go:304	check timeout timer fired
2021-10-03T09:10:05.298039262Z app[452e7fe8] fra [info] proxy    | 2021-10-03T09:10:05.289Z	INFO	cmd/proxy.go:158	Stopping listening
2021-10-03T09:10:07.016862852Z app[452e7fe8] fra [info] proxy    | 2021-10-03T09:10:07.016Z	INFO	cmd/proxy.go:124	Starting proxying
2021-10-03T09:10:07.018774611Z app[452e7fe8] fra [info] proxy    | 2021-10-03T09:10:07.018Z	INFO	cmd/proxy.go:268	master address	{"address": "[fdaa:0:2d8e:a7b:67:0:2457:2]:5433"}
2021-10-03T09:10:07.119153572Z app[452e7fe8] fra [info] keeper   | 2021-10-03T09:10:07.118Z	INFO	cmd/keeper.go:1505	our db requested role is master
2021-10-03T09:10:07.119892608Z app[452e7fe8] fra [info] keeper   | 2021-10-03T09:10:07.119Z	INFO	cmd/keeper.go:1543	already master
2021-10-03T09:10:07.142549712Z app[452e7fe8] fra [info] keeper   | 2021-10-03T09:10:07.142Z	INFO	cmd/keeper.go:1676	postgres parameters not changed
2021-10-03T09:10:07.142831290Z app[452e7fe8] fra [info] keeper   | 2021-10-03T09:10:07.142Z	INFO	cmd/keeper.go:1703	postgres hba entries not changed
2021-10-03T09:10:07.314614106Z app[452e7fe8] fra [info] proxy    | 2021-10-03T09:10:07.314Z	INFO	cmd/proxy.go:286	proxying to master address	{"address": "[fdaa:0:2d8e:a7b:67:0:2457:2]:5433"}
2021-10-03T09:10:08.639333862Z app[452e7fe8] fra [info] keeper   | 2021-10-03 09:10:08.637 UTC [992] LOG:  PID 22688 in cancel request did not match any process
2021-10-03T09:10:09.300942056Z app[452e7fe8] fra [info] sentinel | 2021-10-03T09:10:09.300Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "961135c1", "keeper": "66024582"}
2021-10-03T09:10:12.289505408Z app[452e7fe8] fra [info] keeper   | 2021-10-03T09:10:12.289Z	INFO	cmd/keeper.go:1505	our db requested role is master
2021-10-03T09:10:12.290365683Z app[452e7fe8] fra [info] keeper   | 2021-10-03T09:10:12.290Z	INFO	cmd/keeper.go:1543	already master
2021-10-03T09:10:12.308999052Z app[452e7fe8] fra [info] keeper   | 2021-10-03T09:10:12.308Z	INFO	cmd/keeper.go:1676	postgres parameters not changed
2021-10-03T09:10:12.309609178Z app[452e7fe8] fra [info] keeper   | 2021-10-03T09:10:12.309Z	INFO	cmd/keeper.go:1703	postgres hba entries not changed
2021-10-03T09:10:12.414531993Z app[452e7fe8] fra [info] proxy    | 2021-10-03T09:10:12.414Z	INFO	cmd/proxy.go:268	master address	{"address": "[fdaa:0:2d8e:a7b:67:0:2457:2]:5433"}
2021-10-03T09:10:12.711157265Z app[452e7fe8] fra [info] proxy    | 2021-10-03T09:10:12.710Z	INFO	cmd/proxy.go:286	proxying to master address	{"address": "[fdaa:0:2d8e:a7b:67:0:2457:2]:5433"}
2021-10-03T09:10:15.789068151Z app[452e7fe8] fra [info] sentinel | 2021-10-03T09:10:15.788Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "961135c1", "keeper": "66024582"}
2021-10-03T09:10:17.447023992Z app[452e7fe8] fra [info] keeper   | 2021-10-03T09:10:17.446Z	INFO	cmd/keeper.go:1505	our db requested role is master
2021-10-03T09:10:17.465355902Z app[452e7fe8] fra [info] keeper   | 2021-10-03T09:10:17.465Z	INFO	cmd/keeper.go:1676	postgres parameters not changed
2021-10-03T09:10:17.465601141Z app[452e7fe8] fra [info] keeper   | 2021-10-03T09:10:17.465Z	INFO	cmd/keeper.go:1703	postgres hba entries not changed
2021-10-03T09:10:17.817395117Z app[452e7fe8] fra [info] proxy    | 2021-10-03T09:10:17.817Z	INFO	cmd/proxy.go:268	master address	{"address": "[fdaa:0:2d8e:a7b:67:0:2457:2]:5433"}
2021-10-03T09:10:18.536937955Z app[452e7fe8] fra [info] proxy    | 2021-10-03T09:10:18.536Z	INFO	cmd/proxy.go:286	proxying to master address	{"address": "[fdaa:0:2d8e:a7b:67:0:2457:2]:5433"}

This look suspicious for me:

2021-10-03T09:10:05.298001322Z app[452e7fe8] fra [info] proxy    | 2021-10-03T09:10:05.289Z	INFO	cmd/proxy.go:304	check timeout timer fired
2021-10-03T09:10:05.298039262Z app[452e7fe8] fra [info] proxy    | 2021-10-03T09:10:05.289Z	INFO	cmd/proxy.go:158	Stopping listening
2021-10-03T09:10:07.016862852Z app[452e7fe8] fra [info] proxy    | 2021-10-03T09:10:07.016Z	INFO	cmd/proxy.go:124	Starting proxying

Any idea, what’s happening here?

And currently it’s happening a lot…

Same here on my postgres app staging-db

Connections are often dropped from 2:45pm today.

1 Like

Looks like our proxy was hitting its connection limits… This was preventing Stolon from reliably connecting to the Consul backend store, which explains the issues you are seeing. We have addressed the immediate issue and will be working to help ensure this doesn’t happen again in the future.

I apologize for the inconvenience and let me know you’re still experiencing issues.

2 Likes

I’m still seeing these errors. There have been dozens of alerts so far this morning. Is this related, or another issue? Postgres DBs throwing alerts - #11 by bekit

@shaun Thanks for the update. I’m still seeing errors, let me know when this should be solved.

2 Likes

It no longer seems intermittent and has brought our staging and prod environments completely down in ORD.

1 Like

I’m getting these alerts on the regular and now down completely in yyz

1 Like

Sorry for the trouble folks. We’re taking a look now.

This needs to be escalated ASAP. All databases are down, and restarting or scaling them doesn’t get them working back up again. Our app has been down for about an hour.

4 Likes

Logs have changed a little bit yeah :grimacing:

2021-10-03T16:56:01.776270036Z app[e4da363d] fra [info] keeper   | 2021-10-03T16:56:01.775Z	ERROR	cmd/keeper.go:720	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-10-03T16:56:01.867448422Z app[e4da363d] fra [info] keeper   | 2021-10-03T16:56:01.867Z	INFO	cmd/keeper.go:1094	our db boot UID is different than the cluster data one, waiting for it to be updated	{"bootUUID": "41f8bd49-67fa-4657-8339-755d4b195ba6", "clusterBootUUID": "debb134c-4340-444a-9bf4-dced7cb41299"}
2021-10-03T16:56:04.276758302Z app[e4da363d] fra [info] keeper   | 2021-10-03T16:56:04.276Z	ERROR	cmd/keeper.go:720	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-10-03T16:56:06.777890017Z app[e4da363d] fra [info] keeper   | 2021-10-03T16:56:06.777Z	ERROR	cmd/keeper.go:720	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-10-03T16:56:06.974886015Z app[e4da363d] fra [info] keeper   | 2021-10-03T16:56:06.974Z	INFO	cmd/keeper.go:1094	our db boot UID is different than the cluster data one, waiting for it to be updated	{"bootUUID": "41f8bd49-67fa-4657-8339-755d4b195ba6", "clusterBootUUID": "debb134c-4340-444a-9bf4-dced7cb41299"}
2021-10-03T16:56:08.536901725Z app[e4da363d] fra [info] exporter | INFO[0011] Established new database connection to "fdaa:0:2d8e:a7b:67:0:2457:2:5433".  source="postgres_exporter.go:970"
2021-10-03T16:56:09.278877543Z app[e4da363d] fra [info] keeper   | 2021-10-03T16:56:09.278Z	ERROR	cmd/keeper.go:720	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-10-03T16:56:09.538423404Z app[e4da363d] fra [info] exporter | ERRO[0012] Error opening connection to database (postgresql://flypgadmin:PASSWORD_REMOVED@[fdaa:0:2d8e:a7b:67:0:2457:2]:5433/postgres?sslmode=disable): dial tcp [fdaa:0:2d8e:a7b:67:0:2457:2]:5433: connect: connection refused  source="postgres_exporter.go:1658"
2021-10-03T16:56:11.779980241Z app[e4da363d] fra [info] keeper   | 2021-10-03T16:56:11.779Z	ERROR	cmd/keeper.go:720	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2 Likes

Same problem here - my app can’t access the DB at all, been that way for an hour or so.

logs:

2021-10-03T16:58:50.407135953Z app[32c976eb] fra [info] checking stolon status
2021-10-03T16:58:51.313621389Z app[068a5f69] fra [info] sentinel | 2021-10-03T16:58:51.313Z	INFO	cmd/sentinel.go:94	sentinel leadership lost
2021-10-03T16:58:51.514116207Z app[068a5f69] fra [info] keeper   | 2021-10-03T16:58:51.513Z	ERROR	cmd/keeper.go:720	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-10-03T16:58:51.591402316Z app[068a5f69] fra [info] sentinel | 2021-10-03T16:58:51.591Z	INFO	cmd/sentinel.go:89	sentinel leadership acquired
2021-10-03T16:58:52.467285287Z app[32c976eb] fra [info] keeper   | 2021-10-03T16:58:52.466Z	ERROR	cmd/keeper.go:720	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-10-03T16:58:54.015090623Z app[068a5f69] fra [info] keeper   | 2021-10-03T16:58:54.014Z	ERROR	cmd/keeper.go:720	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-10-03T16:58:54.968306401Z app[32c976eb] fra [info] keeper   | 2021-10-03T16:58:54.967Z	ERROR	cmd/keeper.go:720	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-10-03T16:58:55.188249273Z app[32c976eb] fra [info] sentinel | 2021-10-03T16:58:55.187Z	INFO	cmd/sentinel.go:82	Trying to acquire sentinels leadership
2021-10-03T16:58:56.515946242Z app[068a5f69] fra [info] keeper   | 2021-10-03T16:58:56.515Z	ERROR	cmd/keeper.go:720	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-10-03T16:58:57.469372784Z app[32c976eb] fra [info] keeper   | 2021-10-03T16:58:57.469Z	ERROR	cmd/keeper.go:720	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-10-03T16:58:59.016784361Z app[068a5f69] fra [info] keeper   | 2021-10-03T16:58:59.016Z	ERROR	cmd/keeper.go:720	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-10-03T16:58:59.970264849Z app[32c976eb] fra [info] keeper   | 2021-10-03T16:58:59.970Z	ERROR	cmd/keeper.go:720	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-10-03T16:59:01.517537692Z app[068a5f69] fra [info] keeper   | 2021-10-03T16:59:01.517Z	ERROR	cmd/keeper.go:720	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-10-03T16:59:02.471153453Z app[32c976eb] fra [info] keeper   | 2021-10-03T16:59:02.470Z	ERROR	cmd/keeper.go:720	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}

Same problem, everything is completely down.

Same error here in lhr.

It actually states this now: Identified - Deploying our global state store (Consul) has caused all PostgreSQL clusters to go down. We’re working on resolving the issue.

1 Like

We’re working on this, pretty much everyone who can is on it! The status page has an entry now, we’ll keep it updated with new info.

4 Likes

FWIW, my app seems to work again!

Yeah, same here :slight_smile: Thanks a lot!

my logs keep spitting no eligible masters (cmd/sentinel.go:1009)

Also a lot of DB nodes are gone when I run fly status -a database-app