Postgres does not scale past the initial 2

I executed following commands:

flyctl volumes create pg_data --region iad --size 10 -a there-db
flyctl regions add iad -a there-db
flyctl scale count 3 -a there-db

But still no new instance seems to be spun up.
What do you suggest?

One of my friends has also experienced this issue, tried scaling down, up, deleting, etc and none appearntly has worked.

Sometimes volumes take a little while to get created on our servers, did you only try once and shortly after creating the volume?

Would you mind retrying this?

flyctl regions add iad -a there-db
flyctl scale count 3 -a there-db
1 Like

Could you scale an instance yourself?

Didn’t make a difference unfortantely.

Seems like the volume is not attached (whatever that means!)
See result of volumes list:

vol_02gk9vw7wmr76wm8 pg_data 10GB iad                10 minutes ago
vol_70zy6r76n34djngp pg_data 10GB fra    88e4cbb9    1 day ago
vol_w0enxv37xo48okpy pg_data 10GB fra    bb777e79    1 day ago

Oof this process is full of sharp edges. This was another bug keeping your cluster from doing what it should have. I cleaned it up and added a volume in IAD for you. Let me know if you want it undone.

1 Like

Just tried adding a new replica in ams and logs look totally abnormal:

connect: connection refused  source="postgres_exporter.go:1658"
2021-04-27T07:44:06.684Z 73ddac4c iad [info] Shutting down virtual machine
2021-04-27T07:44:07.214Z 73ddac4c iad [info] Sending signal SIGTERM to main child process w/ PID 511
2021-04-27T07:44:07.219Z 73ddac4c iad [info] keeper            | Interrupting...
2021-04-27T07:44:07.220Z 73ddac4c iad [info] sentinel          | Interrupting...
2021-04-27T07:44:07.220Z 73ddac4c iad [info] proxy             | Interrupting...
2021-04-27T07:44:07.221Z 73ddac4c iad [info] postgres_exporter | Interrupting...
2021-04-27T07:44:07.313Z 73ddac4c iad [info] proxy             | Exited
2021-04-27T07:44:07.314Z 73ddac4c iad [info] sentinel          | Exited
2021-04-27T07:44:07.314Z 73ddac4c iad [info] postgres_exporter | Exited
2021-04-27T07:44:08.215Z 73ddac4c iad [info] Reaped child process with pid: 560 and signal: SIGHUP, core dumped? false
2021-04-27T07:44:08.313Z 73ddac4c iad [info] keeper            | Exited
2021-04-27T07:44:10.218Z 73ddac4c iad [info] Reaped child process with pid: 557, exit code: 0
2021-04-27T07:44:10.218Z 73ddac4c iad [info] Main child exited normally with code: 0
2021-04-27T07:44:10.219Z 73ddac4c iad [info] Starting clean up.
2021-04-27T07:44:10.232Z 73ddac4c iad [info] Umounting /dev/vdc from /data
2021-04-27T07:44:12.222Z 73ddac4c iad [info] Starting instance
2021-04-27T07:44:12.251Z 73ddac4c iad [info] Configuring virtual machine
2021-04-27T07:44:12.252Z 73ddac4c iad [info] Pulling container image
2021-04-27T07:44:14.568Z 73ddac4c iad [info] Unpacking image
2021-04-27T07:44:14.573Z 73ddac4c iad [info] Preparing kernel init
2021-04-27T07:44:14.694Z 73ddac4c iad [info] Setting up volume 'pg_data'
2021-04-27T07:44:14.945Z 73ddac4c iad [info] Configuring firecracker
2021-04-27T07:44:14.969Z 73ddac4c iad [info] Starting virtual machine
2021-04-27T07:44:15.093Z 73ddac4c iad [info] Starting init (commit: 665705e)...
2021-04-27T07:44:15.108Z 73ddac4c iad [info] Mounting /dev/vdc at /data
2021-04-27T07:44:15.111Z 73ddac4c iad [info] Running: ` /fly/` as root
2021-04-27T07:44:15.126Z 73ddac4c iad [info] 2021/04/27 07:44:15 listening on [fdaa:0:20e2:a7b:ab8:0:18d1:2]:22 (DNS: [fdaa::3]:53)
2021-04-27T07:44:15.256Z 73ddac4c iad [info] system            | Tmux socket name: overmind-fly-6jV6ExYwoWxY9HP3aFDTfl
2021-04-27T07:44:15.257Z 73ddac4c iad [info] system            | Tmux session ID: fly
2021-04-27T07:44:15.258Z 73ddac4c iad [info] system            | Listening at ./.overmind.sock
2021-04-27T07:44:15.357Z 73ddac4c iad [info] update_config     | Started with pid 573...
2021-04-27T07:44:15.372Z 73ddac4c iad [info] postgres_exporter | Started with pid 570...
2021-04-27T07:44:15.372Z 73ddac4c iad [info] proxy             | Started with pid 567...
2021-04-27T07:44:15.372Z 73ddac4c iad [info] sentinel          | Started with pid 565...
2021-04-27T07:44:15.372Z 73ddac4c iad [info] keeper            | Started with pid 562...
2021-04-27T07:44:15.390Z 73ddac4c iad [info] postgres_exporter | INFO[0000] Starting Server: :9187                        source="postgres_exporter.go:1837"
2021-04-27T07:44:15.521Z 73ddac4c iad [info] sentinel          | 2021-04-27T07:44:15.517Z	INFO	cmd/sentinel.go:2000	sentinel uid	{"uid": "caf84b3c"}
2021-04-27T07:44:15.955Z 73ddac4c iad [info] postgres_exporter | INFO[0000] Established new database connection to "fdaa:0:20e2:a7b:ab8:0:18d1:2:5433".  source="postgres_exporter.go:970"
2021-04-27T07:44:16.961Z 73ddac4c iad [info] postgres_exporter | ERRO[0001] Error opening connection to database (postgresql://flypgadmin:PASSWORD_REMOVED@[fdaa:0:20e2:a7b:ab8:0:18d1:2]:5433/postgres?sslmode=disable): dial tcp [fdaa:0:20e2:a7b:ab8:0:18d1:2]:5433: connect: connection refused  source="postgres_exporter.go:1658"
2021-04-27T07:44:17.446Z 73ddac4c iad [info] sentinel          | 2021-04-27T07:44:17.443Z	INFO	cmd/sentinel.go:82	Trying to acquire sentinels leadership
2021-04-27T07:44:17.464Z 73ddac4c iad [info] keeper            | 2021-04-27T07:44:17.461Z	ERROR	cmd/keeper.go:688	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-04-27T07:44:17.497Z 73ddac4c iad [info] update_config     | unexpected end of JSON input
2021-04-27T07:44:17.557Z 73ddac4c iad [info] update_config     | Exited
2021-04-27T07:44:17.557Z 73ddac4c iad [info] update_config     | Exited
2021-04-27T07:44:17.557Z 73ddac4c iad [info] update_config     | Exited
2021-04-27T07:44:17.557Z 73ddac4c iad [info] update_config     | Exited
2021-04-27T07:44:17.557Z 73ddac4c iad [info] update_config     | Exited
2021-04-27T07:44:17.557Z 73ddac4c iad [info] update_config     | Exited
2021-04-27T07:44:17.557Z 73ddac4c iad [info] update_config     | Exited
2021-04-27T07:44:19.967Z 73ddac4c iad [info] keeper            | 2021-04-27T07:44:19.965Z	ERROR	cmd/keeper.go:688	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-04-27T07:44:22.468Z 73ddac4c iad [info] keeper            | 2021-04-27T07:44:22.465Z	ERROR	cmd/keeper.go:688	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-04-27T07:44:22.533Z 73ddac4c iad [info] keeper            | 2021-04-27 07:44:22.530 UTC [615] LOG:  starting PostgreSQL 12.5 (Debian 12.5-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
2021-04-27T07:44:22.534Z 73ddac4c iad [info] keeper            | 2021-04-27 07:44:22.533 UTC [615] LOG:  listening on IPv6 address "fdaa:0:20e2:a7b:ab8:0:18d1:2", port 5433
2021-04-27T07:44:22.536Z 73ddac4c iad [info] keeper            | 2021-04-27 07:44:22.535 UTC [615] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5433"
2021-04-27T07:44:22.560Z 73ddac4c iad [info] keeper            | 2021-04-27 07:44:22.557 UTC [616] LOG:  database system was interrupted while in recovery at log time 2021-04-27 07:37:20 UTC
2021-04-27T07:44:22.562Z 73ddac4c iad [info] keeper            | 2021-04-27 07:44:22.557 UTC [616] HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
2021-04-27T07:44:22.711Z 73ddac4c iad [info] keeper            | 2021-04-27 07:44:22.708 UTC [618] FATAL:  the database system is starting up
2021-04-27T07:44:22.713Z 73ddac4c iad [info] keeper            | 2021-04-27T07:44:22.711Z	ERROR	cmd/keeper.go:990	failed to get replication slots	{"error": "pq: the database system is starting up"}
2021-04-27T07:44:22.715Z 73ddac4c iad [info] keeper            | 2021-04-27T07:44:22.711Z	ERROR	cmd/keeper.go:1584	error updating replication slots	{"error": "pq: the database system is starting up"}
2021-04-27T07:44:22.774Z 73ddac4c iad [info] keeper            | 2021-04-27 07:44:22.771 UTC [621] FATAL:  the database system is starting up
2021-04-27T07:44:22.777Z 73ddac4c iad [info] keeper            | 2021-04-27T07:44:22.774Z	ERROR	cmd/keeper.go:1686	failed to check if restart is required	{"error": "pq: the database system is starting up"}
2021-04-27T07:44:22.799Z 73ddac4c iad [info] keeper            | 2021-04-27 07:44:22.798 UTC [616] LOG:  entering standby mode
2021-04-27T07:44:22.803Z 73ddac4c iad [info] keeper            | 2021-04-27 07:44:22.801 UTC [616] FATAL:  hot standby is not possible because max_worker_processes = 1 is a lower setting than on the master server (its value was 8)
2021-04-27T07:44:22.805Z 73ddac4c iad [info] keeper            | 2021-04-27 07:44:22.804 UTC [615] LOG:  startup process (PID 616) exited with exit code 1
2021-04-27T07:44:22.806Z 73ddac4c iad [info] keeper            | 2021-04-27 07:44:22.805 UTC [615] LOG:  aborting startup due to startup process failure
2021-04-27T07:44:22.809Z 73ddac4c iad [info] keeper            | 2021-04-27 07:44:22.808 UTC [615] LOG:  database system is shut down

Now it keeps failing with

[925] FATAL:  hot standby is not possible because max_worker_processes = 1 is a lower setting than on the master server (its value was 8)

Which I saw you’re already aware of it. So effectively I have to migrate database temporarily somewhere else?

The problem here is some tuning settings we’ve applied that vary by VM size: postgres-ha/ at main · fly-apps/postgres-ha · GitHub

These shouldn’t be overridden like they are, it’s a bug. We’re going to figure out a workaround / fix today.