Adding flycast to a legacy stolon-based postgres, no IP?

I have a stolon-based postgres app that has recently started having some connection issues (PQConsumeInput() server closed the connection unexpectedly) from my Rails app (also deployed in Fly.io).

I’m not sure yet why these errors only recently started (the app has been working flawlessly for years prior to this), but as a first step I figured it would be a good opportunity to finally switch over to Flycast for postgres (something I’ve been meaning to do for a while anyway) just to rule out any DNS-type issues. Until now, my Rails app has been connecting to postgres with its .internal host name.

So following the instructions in the above linked post, I ran fly pg add_flycast on my postgres app, which seemed to succeed. Looking at the dashboard for the app, the host name shows as postgresql://<app name>.flycast.

However, there are currently no IP addresses listed for the postgres app. Should there be? The above post mentions running fly ips list should show the new flycast ip, but this currently returns zero results.

When I update my Rails app secret from:

DATABASE_URL=postgres://<user>:<pass>@top2.nearest.of.<app name>.internal:5432/<database>

to

DATABASE_URL=postgres://<user>:<pass>@<app name>.flycast:54332/<database>?sslmode=disable

I now get the error:

There is an issue connecting with your hostname: <app name>.flycast.
Please check your database configuration and ensure there is a valid connection to your database.

Switching back to the .internal name works.

Am I right in thinking that there should be at least one IPv6 address listed in my postgres app?
And at a more fundamental level, is flycast even supported for stolon-based postgres?

(I do plan on eventually moving from stolon to flex, but because I want to keep the current postgres app name, I understand that the only way to do this is to tear down the existing cluster, wait for the name to be freed up, create a fresh cluster with that same name, and the restore the database from backup…so this will require some planning).

Any help would be greatly appreciated.

(PS: Also any help with common causes of the PQConsumeInput() error, but I’m happy to leave that until I have the flycast issue sorted)

Just looking at the logs of my postgres app, it does seem very unhappy.

The health checks keep flapping between passing and failing.

(I’ve removed a lot of duplicate cmd/sentinel.go:276 no keeper info available lines from the excerpt below for brevity, but there were lots of them)

I’m curious about the “Your instance has hit resource limits” errors. Some of these indicate waiting 10s on io, and some are waiting on memory. This is on a database cluster with no active users, and the database size is tiny (<50MB).

The machine is a shared-1x-cpu@256MB, and the metrics show the Firecracker memory usage well below this (~166MB).
As mentioned previously, this personal side-project has worked flawlessly for a number of years, and these issues only started in the last few weeks.
There have been no app deployments or significant data changes (e.g. bulk imports or similar) that could explain the sudden issues.

Not really sure whether this is a Stolon issue or something else?

2025-04-02T22:38:17Z app[21781507a60489] syd [info]sentinel | 2025-04-02T22:38:16.436Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "fa9ebb46", "keeper": "298522a02"}
2025-04-02T22:38:18Z app[21781507a60489] syd [info]keeper   | 2025-04-02 22:38:18.835 UTC [9850] LOG:  could not receive data from client: Connection reset by peer
2025-04-02T22:38:20Z app[21781507a60489] syd [info]sentinel | 2025-04-02T22:38:20.119Z	ERROR	cmd/sentinel.go:1018	no eligible masters
2025-04-02T22:38:39Z health[21781507a60489] syd [info]Health check for your postgres database is now passing.
2025-04-02T22:38:42Z app[21781507a60489] syd [info]sentinel | 2025-04-02T22:38:41.797Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "fa9ebb46", "keeper": "298522a02"}
2025-04-02T22:40:16Z health[21781507a60489] syd [error]Health check for your postgres database has failed. Your database is malfunctioning.
2025-04-02T22:40:18Z app[21781507a60489] syd [info]keeper   | 2025-04-02T22:40:18.118Z	ERROR	cmd/keeper.go:719	cannot get configured pg parameters	{"error": "context deadline exceeded"}
2025-04-02T22:40:24Z app[21781507a60489] syd [info]sentinel | 2025-04-02T22:40:24.845Z	ERROR	cmd/sentinel.go:1895	cannot get keepers info	{"error": "unexpected end of JSON input"}
2025-04-02T22:40:41Z app[21781507a60489] syd [info]sentinel | 2025-04-02T22:40:41.729Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "fa9ebb46", "keeper": "298522a02"}
2025-04-02T22:40:41Z app[21781507a60489] syd [info]sentinel | 2025-04-02T22:40:41.798Z	ERROR	cmd/sentinel.go:1018	no eligible masters
2025-04-02T22:40:54Z health[21781507a60489] syd [info]Health check for your postgres database is now passing.
2025-04-02T22:52:20Z health[21781507a60489] syd [info]Health check for your postgres vm is now passing.
2025-04-02T23:03:43Z health[21781507a60489] syd [error]Health check for your postgres database has failed. Your database is malfunctioning.
2025-04-02T23:03:57Z health[21781507a60489] syd [info]Health check for your postgres database is now passing.
2025-04-02T23:28:27Z app[21781507a60489] syd [info]sentinel | 2025-04-02T23:28:27.702Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "fa9ebb46", "keeper": "298522a02"}
2025-04-02T23:51:30Z health[21781507a60489] syd [error]Health check for your postgres vm has failed. Your instance has hit resource limits. Upgrading your instance / volume size or reducing your usage might help.
[✗] cpu: system spent 1.68s of the last 10 seconds waiting on cpu (276.39µs)
2025-04-02T23:51:41Z health[21781507a60489] syd [info]Health check for your postgres vm is now passing.
2025-04-02T23:54:46Z app[21781507a60489] syd [info]sentinel | 2025-04-02T23:54:46.118Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "fa9ebb46", "keeper": "298522a02"}
2025-04-03T01:05:51Z health[21781507a60489] syd [error]Health check for your postgres vm has failed. Your instance has hit resource limits. Upgrading your instance / volume size or reducing your usage might help.
[✗] io: system spent 1.36s of the last 10 seconds waiting on io (30.3µs)
2025-04-03T01:06:01Z health[21781507a60489] syd [info]Health check for your postgres vm is now passing.
2025-04-03T01:06:22Z health[21781507a60489] syd [error]Health check for your postgres vm has failed. Your instance has hit resource limits. Upgrading your instance / volume size or reducing your usage might help.
[✗] memory: system spent 1.07s of the last 10 seconds waiting on memory (71.21µs)
[✗] io: system spent 1.38s of the last 10 seconds waiting on io (21.72µs)
2025-04-03T01:06:42Z health[21781507a60489] syd [info]Health check for your postgres vm is now passing.
2025-04-03T01:08:38Z health[21781507a60489] syd [error]Health check for your postgres vm has failed. Your instance has hit resource limits. Upgrading your instance / volume size or reducing your usage might help.
[✗] io: system spent 1.38s of the last 10 seconds waiting on io (34.43µs)
2025-04-03T01:08:52Z health[21781507a60489] syd [info]Health check for your postgres vm is now passing.
2025-04-03T01:10:22Z health[21781507a60489] syd [error]Health check for your postgres vm has failed. Your instance has hit resource limits. Upgrading your instance / volume size or reducing your usage might help.
[✗] memory: system spent 1.88s of the last 10 seconds waiting on memory (40.2µs)
[✗] io: system spent 2.25s of the last 10 seconds waiting on io (21.9µs)
2025-04-03T01:10:52Z health[21781507a60489] syd [info]Health check for your postgres vm is now passing.
2025-04-03T01:31:52Z health[21781507a60489] syd [error]Health check for your postgres vm has failed. Your instance has hit resource limits. Upgrading your instance / volume size or reducing your usage might help.
[✗] io: system spent 1.41s of the last 10 seconds waiting on io (46.53µs)
2025-04-03T01:31:52Z health[21781507a60489] syd [error]Health check for your postgres vm has failed. Your instance has hit resource limits. Upgrading your instance / volume size or reducing your usage might help.
[✗] io: system spent 1.41s of the last 10 seconds waiting on io (46.53µs)
2025-04-03T01:31:52Z health[21781507a60489] syd [error]Health check for your postgres vm has failed. Your instance has hit resource limits. Upgrading your instance / volume size or reducing your usage might help.
[✗] io: system spent 1.41s of the last 10 seconds waiting on io (46.53µs)
2025-04-03T01:32:02Z health[21781507a60489] syd [info]Health check for your postgres vm is now passing.
2025-04-03T01:41:23Z health[21781507a60489] syd [error]Health check for your postgres vm has failed. Your instance has hit resource limits. Upgrading your instance / volume size or reducing your usage might help.
[✗] io: system spent 1.06s of the last 10 seconds waiting on io (37.39µs)
2025-04-03T01:41:32Z health[21781507a60489] syd [info]Health check for your postgres vm is now passing.
2025-04-03T02:38:24Z health[21781507a60489] syd [error]Health check for your postgres vm has failed. Your instance has hit resource limits. Upgrading your instance / volume size or reducing your usage might help.
[✗] io: system spent 1.2s of the last 10 seconds waiting on io (41.62µs)
2025-04-03T02:38:42Z app[21781507a60489] syd [info]sentinel | 2025-04-03T02:38:41.971Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "fa9ebb46", "keeper": "298522a02"}
2025-04-03T02:38:55Z app[21781507a60489] syd [info]sentinel | 2025-04-03T02:38:54.789Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "fa9ebb46", "keeper": "298522a02"}
2025-04-03T02:39:24Z health[21781507a60489] syd [error]Health check for your postgres database has failed. Your database is malfunctioning.
HTTP GET http://172.19.137.58:5500/flycheck/pg: 500 Internal Server Error Output: [✗] transactions: Timed out (321.27ms)
2025-04-03T02:39:29Z app[21781507a60489] syd [info]sentinel | 2025-04-03T02:39:29.316Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "fa9ebb46", "keeper": "298522a02"}
2025-04-03T02:39:40Z app[21781507a60489] syd [info]sentinel | 2025-04-03T02:39:40.596Z	ERROR	cmd/sentinel.go:1895	cannot get keepers info	{"error": "unexpected end of JSON input"}
2025-04-03T02:39:44Z health[21781507a60489] syd [info]Health check for your postgres database is now passing.
2025-04-03T02:39:56Z app[21781507a60489] syd [info]sentinel | 2025-04-03T02:39:56.040Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "fa9ebb46", "keeper": "298522a02"}
2025-04-03T02:40:04Z health[21781507a60489] syd [error]Health check for your postgres database has failed. Your database is malfunctioning.

2025-04-03T02:40:04Z app[21781507a60489] syd [info]sentinel | 2025-04-03T02:40:03.719Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "fa9ebb46", "keeper": "298522a02"}
2025-04-03T02:40:07Z health[21781507a60489] syd [error]Health check for your postgres role has failed. Your cluster's membership is inconsistent.

2025-04-03T02:40:08Z app[21781507a60489] syd [info]keeper   | 2025-04-03T02:40:06.925Z	ERROR	cmd/keeper.go:742	error getting pg state	{"error": "query returned 0 rows"}
2025-04-03T02:40:17Z app[21781507a60489] syd [info]sentinel | 2025-04-03T02:40:17.318Z	ERROR	cmd/sentinel.go:1018	no eligible masters
2025-04-03T02:40:29Z app[21781507a60489] syd [info]sentinel | 2025-04-03T02:40:29.236Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "fa9ebb46", "keeper": "298522a02"}
2025-04-03T02:40:29Z app[21781507a60489] syd [info]sentinel | 2025-04-03T02:40:29.317Z	ERROR	cmd/sentinel.go:1018	no eligible masters
2025-04-03T02:40:48Z health[21781507a60489] syd [info]Health check for your postgres role is now passing.
2025-04-03T02:40:55Z app[21781507a60489] syd [info]proxy    | [WARNING] 092/024054 (683) : Server bk_db/pg1 is DOWN, reason: Layer7 timeout, check duration: 5119ms. 0 active and 1 backup servers left. Running on backup. 5 sessions active, 0 requeued, 0 remaining in queue.
2025-04-03T02:40:55Z app[21781507a60489] syd [info]proxy    | [WARNING] 092/024054 (683) : Backup Server bk_db/pg is DOWN, reason: Layer7 timeout, check duration: 5119ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2025-04-03T02:40:55Z app[21781507a60489] syd [info]proxy    | [ALERT] 092/024054 (683) : backend 'bk_db' has no server available!
2025-04-03T02:40:55Z app[21781507a60489] syd [info]keeper   | 2025-04-03 02:40:55.566 UTC [27806] LOG:  could not receive data from client: Connection reset by peer
2025-04-03T02:40:55Z app[21781507a60489] syd [info]keeper   | 2025-04-03 02:40:55.966 UTC [27805] LOG:  could not receive data from client: Connection reset by peer
2025-04-03T02:40:58Z app[21781507a60489] syd [info]sentinel | 2025-04-03T02:40:58.041Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "fa9ebb46", "keeper": "298522a02"}
2025-04-03T02:40:58Z app[21781507a60489] syd [info]sentinel | 2025-04-03T02:40:58.437Z	ERROR	cmd/sentinel.go:1018	no eligible masters
2025-04-03T02:41:04Z app[21781507a60489] syd [info]proxy    | [WARNING] 092/024104 (683) : Backup Server bk_db/pg is UP, reason: Layer7 check passed, code: 200, check duration: 2883ms. 0 active and 1 backup servers online. Running on backup. 0 sessions requeued, 0 total in queue.
2025-04-03T02:41:04Z app[21781507a60489] syd [info]proxy    | [WARNING] 092/024104 (683) : Server bk_db/pg1 is UP, reason: Layer7 check passed, code: 200, check duration: 2958ms. 1 active and 1 backup servers online. 0 sessions requeued, 0 total in queue.
2025-04-03T02:41:26Z app[21781507a60489] syd [info]sentinel | 2025-04-03T02:41:26.443Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "fa9ebb46", "keeper": "298522a02"}
2025-04-03T02:41:39Z health[21781507a60489] syd [info]Health check for your postgres database is now passing.
2025-04-03T02:41:52Z app[21781507a60489] syd [info]sentinel | 2025-04-03T02:41:52.142Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "fa9ebb46", "keeper": "298522a02"}
2025-04-03T02:44:39Z app[21781507a60489] syd [info]sentinel | 2025-04-03T02:44:38.915Z	WARN	cmd/sentinel.go:276	no keeper info available	{"db": "fa9ebb46", "keeper": "298522a02"}
2025-04-03T02:44:56Z health[21781507a60489] syd [error]Health check for your postgres database has failed. Your database is malfunctioning.

Hm… This does look like you might have either disk corruption or a mangled Consul state.

I don’t know the Fly Postgres internals well enough to say what exact surgery could be done, however.

Personally, I would try the volume-forking technique (with explicit volume ID) and see whether the new DB app still shows those anomalies.

(I don’t recall whether that works on Stolon-era databases, now that I think about it. If not, there’s a similar procedure based on snapshots.)


Yep, Flycast apps will have an address of TYPE = private and VERSION = v6. It’s not clear whether that add_flycast command has really been maintained, though. Glancing at the source code, it modifies the [[services]] definitions but doesn’t actually allocate the address, :fish_cake_with_swirl:.

Other things that can be checked are fly dig db-app-name.flycast and fly services list -a db-app-name. (“Services” in this context meaning things that the Fly Proxy intermediates. It wasn’t involved with the .internal traffic.)

Hope this helps a little!

2 Likes

Thanks for the suggestions @mayailurus .

It sounds like the best & safest course of action at this point for me will be to bite the bullet and finally move off Stolon to Flex.

It’s something I’ve been wanting to do, but putting off for no good reason. So this is the push I needed to get it done.

1 Like

Just a quick followup - I successfully migrated my postgres app from Stolon to Flex, and the issues I was seeing are now resolved.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.