Wanting to create a postgres backup - seeing failed error checks

I was going to create a backup of a postgres database for one of my apps.

I always use the flyproxy and pgadmin to create a backup. It has worked in the past but now it is running into issues. It seems the connection was closing unexpectedly.

I am looking at the postgres machine checks and it seems that this is failing:

| 2025-11-20 17:05:1400 Internal Server Error
[✓] checkDisk: 770.56 MB (79.2%!)(MISSING) free space on /data/ (76.89µs)
[✓] checkLoad: load averages: 0.16 0.18 0.24 (117.46µs)
[✓] memory: system spent 0s of the last 60s waiting on memory (92.3µs)
[✗] cpu: system spent 1.46s of the last 10 seconds waiting on cpu (402.78µs)

[✓] io: system spent 156ms of the last 60s waiting on io (210.77µs)

Anyone have suggestions on how to fix this? I mainly just want to be able to create a backup at this point.

Hi… It would help to know more about your cluster, since this looks like a resource mismatch. The full output of fly m list -a db-app-name would be best, if you can. (You can use the </> button in the toolbar to get an area suitable for pasting code, output, etc.)

I don’t use Postgres on Fly.io on a regular basis, but, in my experiments, the smaller Machine sizes have seemed underpowered for PG ever since CPU throttling was introduced.

73287351ad2d85	proud-sun-9232	started	1/3   	yyz   	leader	flyio/postgres:14.6 (v0.0.34)	fdaa:0:a4db:a7b:88dc:2e40:5ccc:2	vol_vpgo9gpomqqpm62v	2025-11-20T16:09:35Z	2025-11-20T16:16:42Z	app          	shared-cpu-1x:256MB

I was able to create a backup on July 20th. I have not had a chance to move managed postgres. Can I increase the machine or is this some other issue?

This seems to be the error logs issue:

2025-11-28T13:11:15.861 app[73287351ad2d85] yyz [info] 2025/11/28 13:11:15 http: panic serving [fdaa:0:a4db:a7b:88dc:2e40:5ccc:2]:48678: runtime error: invalid memory address or nil pointer dereference

2025-11-28T13:11:15.861 app[73287351ad2d85] yyz [info] goroutine 6860320 [running]:

2025-11-28T13:11:15.861 app[73287351ad2d85] yyz [info] net/http.(*conn).serve.func1(0xc0004aabe0)

2025-11-28T13:11:15.861 app[73287351ad2d85] yyz [info] /usr/local/go/src/net/http/server.go:1805 +0x153

2025-11-28T13:11:15.861 app[73287351ad2d85] yyz [info] panic(0x8270a0, 0xb41c70)

2025-11-28T13:11:15.861 app[73287351ad2d85] yyz [info] /usr/local/go/src/runtime/panic.go:971 +0x499

2025-11-28T13:11:15.861 app[73287351ad2d85] yyz [info] github.com/fly-examples/postgres-ha/pkg/check.(*Check).RawResult(0xc00008ccb0, 0x8b5cb8, 0xc000032000)

2025-11-28T13:11:15.861 app[73287351ad2d85] yyz [info] /go/src/github.com/fly-examples/postgres-ha/pkg/check/check.go:60 +0x60

2025-11-28T13:11:15.861 app[73287351ad2d85] yyz [info] github.com/fly-examples/postgres-ha/pkg/check.(*CheckSuite).RawResult(0xc000277900, 0x44126e, 0xc000427860)

2025-11-28T13:11:15.861 app[73287351ad2d85] yyz [info] /go/src/github.com/fly-examples/postgres-ha/pkg/check/check_suite.go:84 +0x93

2025-11-28T13:11:15.861 app[73287351ad2d85] yyz [info] github.com/fly-examples/postgres-ha/pkg/flycheck.handleCheckResponse(0x919388, 0xc0000f90a0, 0xc000277900, 0x91a801)

2025-11-28T13:11:15.861 app[73287351ad2d85] yyz [info] /go/src/github.com/fly-examples/postgres-ha/pkg/flycheck/checks.go:89 +0x51

2025-11-28T13:11:15.861 app[73287351ad2d85] yyz [info] github.com/fly-examples/postgres-ha/pkg/flycheck.runRoleCheck(0x919388, 0xc0000f90a0, 0xc000217700)

2025-11-28T13:11:15.861 app[73287351ad2d85] yyz [info] /go/src/github.com/fly-examples/postgres-ha/pkg/flycheck/checks.go:79 +0x1ba

2025-11-28T13:11:15.861 app[73287351ad2d85] yyz [info] net/http.HandlerFunc.ServeHTTP(0x8b53c0, 0x919388, 0xc0000f90a0, 0xc000217700)

2025-11-28T13:11:15.861 app[73287351ad2d85] yyz [info] /usr/local/go/src/net/http/server.go:2050 +0x44

2025-11-28T13:11:15.861 app[73287351ad2d85] yyz [info] net/http.(*ServeMux).ServeHTTP(0xc00007ccc0, 0x919388, 0xc0000f90a0, 0xc000217700)

2025-11-28T13:11:15.861 app[73287351ad2d85] yyz [info] /usr/local/go/src/net/http/server.go:2429 +0x1ad

2025-11-28T13:11:15.861 app[73287351ad2d85] yyz [info] github.com/go-chi/chi/v5.(*Mux).Mount.func1(0x919388, 0xc0000f90a0, 0xc000217700)

2025-11-28T13:11:15.861 app[73287351ad2d85] yyz [info] /go/pkg/mod/github.com/go-chi/chi/v5@v5.0.7/mux.go:314 +0x176

2025-11-28T13:11:15.861 app[73287351ad2d85] yyz [info] net/http.HandlerFunc.ServeHTTP(0xc0000609a0, 0x919388, 0xc0000f90a0, 0xc000217700)

2025-11-28T13:11:15.861 app[73287351ad2d85] yyz [info] /usr/local/go/src/net/http/server.go:2050 +0x44

2025-11-28T13:11:15.861 app[73287351ad2d85] yyz [info] github.com/go-chi/chi/v5.(*Mux).routeHTTP(0xc0000631a0, 0x919388, 0xc0000f90a0, 0xc000217700)

2025-11-28T13:11:15.861 app[73287351ad2d85] yyz [info] /go/pkg/mod/github.com/go-chi/chi/v5@v5.0.7/mux.go:442 +0x2a9

2025-11-28T13:11:15.861 app[73287351ad2d85] yyz [info] net/http.HandlerFunc.ServeHTTP(0xc00001fae0, 0x919388, 0xc0000f90a0, 0xc000217700)

2025-11-28T13:11:15.861 app[73287351ad2d85] yyz [info] /usr/local/go/src/net/http/server.go:2050 +0x44

2025-11-28T13:11:15.861 app[73287351ad2d85] yyz [info] github.com/go-chi/chi/v5.(*Mux).ServeHTTP(0xc0000631a0, 0x919388, 0xc0000f90a0, 0xc000217600)

2025-11-28T13:11:15.861 app[73287351ad2d85] yyz [info] /go/pkg/mod/github.com/go-chi/chi/v5@v5.0.7/mux.go:88 +0x310

2025-11-28T13:11:15.861 app[73287351ad2d85] yyz [info] net/http.serverHandler.ServeHTTP(0xc0000f80e0, 0x919388, 0xc0000f90a0, 0xc000217600)

2025-11-28T13:11:15.861 app[73287351ad2d85] yyz [info] /usr/local/go/src/net/http/server.go:2868 +0xa3

2025-11-28T13:11:15.861 app[73287351ad2d85] yyz [info] net/http.(*conn).serve(0xc0004aabe0, 0x91a900, 0xc0003c5b00)

2025-11-28T13:11:15.861 app[73287351ad2d85] yyz [info] /usr/local/go/src/net/http/server.go:1933 +0x8cd2025-11-28T13:11:15.861 app[73287351ad2d85] yyz [info] created by net/http.(*Server).Serve

2025-11-28T13:11:15.861 app[73287351ad2d85] yyz [info] /usr/local/go/src/net/http/server.go:2994 +0x39b

Hm… You have one of the “doubly deprecated”, Stolon-based images, so this might be challenging… When you said that you “mainly just want to be able to create a backup at this point”, did you mean that you don’t need this node as a running, request-serving Machine in the future? (If you do want it later, then it would be prudent to plan some kind of migration in the near future, in my opinion.)

I would look at the metrics in Grafana first. Do you see the throttling mentioned in the above official docs page, for example?

You might be able to time the pg_dump invocation for a period in which you have a large burst allowance accumulated.

Single-volume databases on the Fly.io platform have a high risk of permanent data loss, since the underlying physical host machine will fail someday.

It would be best to have a much more regular backup schedule (unless writes are so infrequent that there really only ever is new data every few months).

This is one of the things that Managed Postgres would solve for you.

So the app using the database only really runs in the summer for a league. Looking at Grafana I dont see high CPU utilization. Around 10-30% and no spikes

However, I do see quit high memory usage.

What is the best path forward here?

What does this mean:

Health check for your postgres role has failed. Your cluster's membership is inconsistent

I would move this to Managed Postgres, if it was my league. My guess is that the Fly Support that comes with that includes giving tips to migrate from the state you’re presently in, although it doesn’t say explicitly that in their official docs.

Assuming that you were looking at the same graph that I’m thinking of, a sustained 10–30% utilization is way above the throttling threshold for a shared-1x Machine. It’s going to be hard to operate at all in that regime…

The official docs describe a procedure for upgrading the specs of a PG Flex node, but that’s the next generation after what you actually have, and I wouldn’t just wing it.

I increase the memory on the machine and was able to create a backup it seems. The vm is now passing the memory check. Thanks for the input.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.