Postgres does not scale past the initial 2

I executed following commands:

flyctl volumes create pg_data --region iad --size 10 -a there-db
flyctl regions add iad -a there-db
flyctl scale count 3 -a there-db

But still no new instance seems to be spun up.
What do you suggest?

Update:
One of my friends has also experienced this issue, tried scaling down, up, deleting, etc and none appearntly has worked.

Sometimes volumes take a little while to get created on our servers, did you only try once and shortly after creating the volume?

Would you mind retrying this?

flyctl regions add iad -a there-db
flyctl scale count 3 -a there-db
1 Like

Could you scale an instance yourself?

Didn’t make a difference unfortantely.

Seems like the volume is not attached (whatever that means!)
See result of volumes list:

ID                   NAME    SIZE REGION ATTACHED VM CREATED AT
vol_02gk9vw7wmr76wm8 pg_data 10GB iad                10 minutes ago
vol_70zy6r76n34djngp pg_data 10GB fra    88e4cbb9    1 day ago
vol_w0enxv37xo48okpy pg_data 10GB fra    bb777e79    1 day ago

Oof this process is full of sharp edges. This was another bug keeping your cluster from doing what it should have. I cleaned it up and added a volume in IAD for you. Let me know if you want it undone.

1 Like

Just tried adding a new replica in ams and logs look totally abnormal:

connect: connection refused  source="postgres_exporter.go:1658"
2021-04-27T07:44:06.684Z 73ddac4c iad [info] Shutting down virtual machine
2021-04-27T07:44:07.214Z 73ddac4c iad [info] Sending signal SIGTERM to main child process w/ PID 511
2021-04-27T07:44:07.219Z 73ddac4c iad [info] keeper            | Interrupting...
2021-04-27T07:44:07.220Z 73ddac4c iad [info] sentinel          | Interrupting...
2021-04-27T07:44:07.220Z 73ddac4c iad [info] proxy             | Interrupting...
2021-04-27T07:44:07.221Z 73ddac4c iad [info] postgres_exporter | Interrupting...
2021-04-27T07:44:07.313Z 73ddac4c iad [info] proxy             | Exited
2021-04-27T07:44:07.314Z 73ddac4c iad [info] sentinel          | Exited
2021-04-27T07:44:07.314Z 73ddac4c iad [info] postgres_exporter | Exited
2021-04-27T07:44:08.215Z 73ddac4c iad [info] Reaped child process with pid: 560 and signal: SIGHUP, core dumped? false
2021-04-27T07:44:08.313Z 73ddac4c iad [info] keeper            | Exited
2021-04-27T07:44:10.218Z 73ddac4c iad [info] Reaped child process with pid: 557, exit code: 0
2021-04-27T07:44:10.218Z 73ddac4c iad [info] Main child exited normally with code: 0
2021-04-27T07:44:10.219Z 73ddac4c iad [info] Starting clean up.
2021-04-27T07:44:10.232Z 73ddac4c iad [info] Umounting /dev/vdc from /data
2021-04-27T07:44:12.222Z 73ddac4c iad [info] Starting instance
2021-04-27T07:44:12.251Z 73ddac4c iad [info] Configuring virtual machine
2021-04-27T07:44:12.252Z 73ddac4c iad [info] Pulling container image
2021-04-27T07:44:14.568Z 73ddac4c iad [info] Unpacking image
2021-04-27T07:44:14.573Z 73ddac4c iad [info] Preparing kernel init
2021-04-27T07:44:14.694Z 73ddac4c iad [info] Setting up volume 'pg_data'
2021-04-27T07:44:14.945Z 73ddac4c iad [info] Configuring firecracker
2021-04-27T07:44:14.969Z 73ddac4c iad [info] Starting virtual machine
2021-04-27T07:44:15.093Z 73ddac4c iad [info] Starting init (commit: 665705e)...
2021-04-27T07:44:15.108Z 73ddac4c iad [info] Mounting /dev/vdc at /data
2021-04-27T07:44:15.111Z 73ddac4c iad [info] Running: `docker-entrypoint.sh /fly/start.sh` as root
2021-04-27T07:44:15.126Z 73ddac4c iad [info] 2021/04/27 07:44:15 listening on [fdaa:0:20e2:a7b:ab8:0:18d1:2]:22 (DNS: [fdaa::3]:53)
2021-04-27T07:44:15.256Z 73ddac4c iad [info] system            | Tmux socket name: overmind-fly-6jV6ExYwoWxY9HP3aFDTfl
2021-04-27T07:44:15.257Z 73ddac4c iad [info] system            | Tmux session ID: fly
2021-04-27T07:44:15.258Z 73ddac4c iad [info] system            | Listening at ./.overmind.sock
2021-04-27T07:44:15.357Z 73ddac4c iad [info] update_config     | Started with pid 573...
2021-04-27T07:44:15.372Z 73ddac4c iad [info] postgres_exporter | Started with pid 570...
2021-04-27T07:44:15.372Z 73ddac4c iad [info] proxy             | Started with pid 567...
2021-04-27T07:44:15.372Z 73ddac4c iad [info] sentinel          | Started with pid 565...
2021-04-27T07:44:15.372Z 73ddac4c iad [info] keeper            | Started with pid 562...
2021-04-27T07:44:15.390Z 73ddac4c iad [info] postgres_exporter | INFO[0000] Starting Server: :9187                        source="postgres_exporter.go:1837"
2021-04-27T07:44:15.521Z 73ddac4c iad [info] sentinel          | 2021-04-27T07:44:15.517Z	INFO	cmd/sentinel.go:2000	sentinel uid	{"uid": "caf84b3c"}
2021-04-27T07:44:15.955Z 73ddac4c iad [info] postgres_exporter | INFO[0000] Established new database connection to "fdaa:0:20e2:a7b:ab8:0:18d1:2:5433".  source="postgres_exporter.go:970"
2021-04-27T07:44:16.961Z 73ddac4c iad [info] postgres_exporter | ERRO[0001] Error opening connection to database (postgresql://flypgadmin:PASSWORD_REMOVED@[fdaa:0:20e2:a7b:ab8:0:18d1:2]:5433/postgres?sslmode=disable): dial tcp [fdaa:0:20e2:a7b:ab8:0:18d1:2]:5433: connect: connection refused  source="postgres_exporter.go:1658"
2021-04-27T07:44:17.446Z 73ddac4c iad [info] sentinel          | 2021-04-27T07:44:17.443Z	INFO	cmd/sentinel.go:82	Trying to acquire sentinels leadership
2021-04-27T07:44:17.464Z 73ddac4c iad [info] keeper            | 2021-04-27T07:44:17.461Z	ERROR	cmd/keeper.go:688	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-04-27T07:44:17.497Z 73ddac4c iad [info] update_config     | unexpected end of JSON input
2021-04-27T07:44:17.557Z 73ddac4c iad [info] update_config     | Exited
2021-04-27T07:44:17.557Z 73ddac4c iad [info] update_config     | Exited
2021-04-27T07:44:17.557Z 73ddac4c iad [info] update_config     | Exited
2021-04-27T07:44:17.557Z 73ddac4c iad [info] update_config     | Exited
2021-04-27T07:44:17.557Z 73ddac4c iad [info] update_config     | Exited
2021-04-27T07:44:17.557Z 73ddac4c iad [info] update_config     | Exited
2021-04-27T07:44:17.557Z 73ddac4c iad [info] update_config     | Exited
2021-04-27T07:44:19.967Z 73ddac4c iad [info] keeper            | 2021-04-27T07:44:19.965Z	ERROR	cmd/keeper.go:688	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-04-27T07:44:22.468Z 73ddac4c iad [info] keeper            | 2021-04-27T07:44:22.465Z	ERROR	cmd/keeper.go:688	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}
2021-04-27T07:44:22.533Z 73ddac4c iad [info] keeper            | 2021-04-27 07:44:22.530 UTC [615] LOG:  starting PostgreSQL 12.5 (Debian 12.5-1.pgdg100+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
2021-04-27T07:44:22.534Z 73ddac4c iad [info] keeper            | 2021-04-27 07:44:22.533 UTC [615] LOG:  listening on IPv6 address "fdaa:0:20e2:a7b:ab8:0:18d1:2", port 5433
2021-04-27T07:44:22.536Z 73ddac4c iad [info] keeper            | 2021-04-27 07:44:22.535 UTC [615] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5433"
2021-04-27T07:44:22.560Z 73ddac4c iad [info] keeper            | 2021-04-27 07:44:22.557 UTC [616] LOG:  database system was interrupted while in recovery at log time 2021-04-27 07:37:20 UTC
2021-04-27T07:44:22.562Z 73ddac4c iad [info] keeper            | 2021-04-27 07:44:22.557 UTC [616] HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
2021-04-27T07:44:22.711Z 73ddac4c iad [info] keeper            | 2021-04-27 07:44:22.708 UTC [618] FATAL:  the database system is starting up
2021-04-27T07:44:22.713Z 73ddac4c iad [info] keeper            | 2021-04-27T07:44:22.711Z	ERROR	cmd/keeper.go:990	failed to get replication slots	{"error": "pq: the database system is starting up"}
2021-04-27T07:44:22.715Z 73ddac4c iad [info] keeper            | 2021-04-27T07:44:22.711Z	ERROR	cmd/keeper.go:1584	error updating replication slots	{"error": "pq: the database system is starting up"}
2021-04-27T07:44:22.774Z 73ddac4c iad [info] keeper            | 2021-04-27 07:44:22.771 UTC [621] FATAL:  the database system is starting up
2021-04-27T07:44:22.777Z 73ddac4c iad [info] keeper            | 2021-04-27T07:44:22.774Z	ERROR	cmd/keeper.go:1686	failed to check if restart is required	{"error": "pq: the database system is starting up"}
2021-04-27T07:44:22.799Z 73ddac4c iad [info] keeper            | 2021-04-27 07:44:22.798 UTC [616] LOG:  entering standby mode
2021-04-27T07:44:22.803Z 73ddac4c iad [info] keeper            | 2021-04-27 07:44:22.801 UTC [616] FATAL:  hot standby is not possible because max_worker_processes = 1 is a lower setting than on the master server (its value was 8)
2021-04-27T07:44:22.805Z 73ddac4c iad [info] keeper            | 2021-04-27 07:44:22.804 UTC [615] LOG:  startup process (PID 616) exited with exit code 1
2021-04-27T07:44:22.806Z 73ddac4c iad [info] keeper            | 2021-04-27 07:44:22.805 UTC [615] LOG:  aborting startup due to startup process failure
2021-04-27T07:44:22.809Z 73ddac4c iad [info] keeper            | 2021-04-27 07:44:22.808 UTC [615] LOG:  database system is shut down

Now it keeps failing with

[925] FATAL:  hot standby is not possible because max_worker_processes = 1 is a lower setting than on the master server (its value was 8)

Which I saw you’re already aware of it. So effectively I have to migrate database temporarily somewhere else?

The problem here is some tuning settings we’ve applied that vary by VM size: postgres-ha/start.sh at main · fly-apps/postgres-ha · GitHub

These shouldn’t be overridden like they are, it’s a bug. We’re going to figure out a workaround / fix today.