Is there a way or guide on how to install timescaledb as a ha-postgresql?
I don’t think we have one ready, but I imagine adapting GitHub - fly-apps/postgres-ha: Postgres + Stolon for HA clusters as Fly apps. to replace plain Postgres with Timescale should work fine.
Will see if we can make an example, and in the meanwhile if you’re trying it out we can help out here.
I cloned the repo and did a
flyctl launch
flyctl volumes create pg_data --region sea --size 10
flyctl secrets set SU_PASSWORD=[redacted] REPL_PASSWORD=[redacted]
flyctl deploy
The deploy is giving me the following error:
2021-11-08T04:54:29.000 [info] keeper | 2021-11-08T04:54:29.334Z INFO cmd/keeper.go:1676 postgres parameters not changed
2021-11-08T04:54:29.000 [info] keeper | 2021-11-08T04:54:29.334Z INFO cmd/keeper.go:1703 postgres hba entries not changed
2021-11-08T04:54:34.000 [info] keeper | 2021-11-08T04:54:34.444Z INFO cmd/keeper.go:1505 our db requested role is master
2021-11-08T04:54:34.000 [info] keeper | 2021-11-08T04:54:34.445Z INFO cmd/keeper.go:1543 already master
2021-11-08T04:54:34.000 [info] keeper | 2021-11-08T04:54:34.458Z INFO cmd/keeper.go:1676 postgres parameters not changed
2021-11-08T04:54:34.000 [info] keeper | 2021-11-08T04:54:34.459Z INFO cmd/keeper.go:1703 postgres hba entries not changed
***v1 failed - Failed due to unhealthy allocations and deploying as v2
Let me try replicating this… in the meanwhile, does the default Timescale image: Docker Hub work as is? Or are you specifically looking for a high-availability system?
Yeah, I’m looking to not steer too far from what fly-apps/postgres-ha has.
As TimescaleDB can be installed just fine as a plugin, I forked the postgres-ha repo and modified the Dockerfile
to add what is needed to have TimescaleDB with w/o having to change the base Postgres image:
I can create the image on my local just fine but it also fails when deploying.
Did you use the fly.toml
in the directory as well? It has a few directives that need to be enabled.
That said, the error might not be an error — is the app actually working? Most are info lines and the message actually does a retry - did it succeed finally?
Could you post the full deploy logs if possible? Would like to see what happens before and after the errors.
Yes, I used the same fly.toml. doing flyctl launch
just renamed the project. The app wasn’t deployed. I’ll try to spin another one and will come back with the results.
BTW running init I’m getting:
➜ postgres-ha git:(main) flyctl init
Error: unknown command "init" for "flyctl"
Did you mean this?
info
Run 'flyctl --help' for usage.
Error unknown command "init" for "flyctl"
Did you mean this?
info
➜ postgres-ha git:(main)
➜ postgres-ha git:(main) flyctl version
flyctl v0.0.250 darwin/amd64 Commit: 7a90db9 BuildDate: 2021-10-28T20:49:47Z
Think that’s deprecated, you’d probably want to do flyctl launch
.
Here is the whole failing deploy
Launching
➜ postgres-ha git:(main) flyctl launch
An existing fly.toml file was found for app postgres-ha-example
? Would you like to copy its configuration to the new app? Yes
Creating app in /Users/ericktamayo/Code/metronome/postgres-ha
Scanning source code
Detected a Dockerfile app
? App Name (leave blank to use an auto-generated name): postgres-ha-example
? Select organization: Metronome (metronome)
? Select region: sea (Seattle, Washington (US))
Created app postgres-ha-example in organization metronome
Wrote config file fly.toml
? Would you like to deploy now? No
Your app is ready. Deploy with `flyctl deploy`
Here the app is created but pending deployment
➜ postgres-ha git:(main) ✗ flyctl volumes create pg_data --region sea --size 10
ID: vol_k0o6d42gyn7v87gy
Name: pg_data
Region: sea
Size GB: 10
Encrypted: true
Created at: 08 Nov 21 18:08 UTC
Adding the secrets as per the README
➜ postgres-ha git:(main) ✗ flyctl secrets set SU_PASSWORD=[redacted] REPL_PASSWORD=[redacted]
Secrets are staged for the first deployment
Deploying
➜ postgres-ha git:(main) ✗ flyctl deploy
Deploying postgres-ha-example
==> Validating app configuration
--> Validating app configuration done
==> Creating build context
--> Creating build context done
==> Building image with Docker
--> docker host: 20.10.8 linux x86_64
Sending build context to Docker daemon 172.5kB
[+] Building 4.2s (23/23) FINISHED
=> [internal] load remote build context 0.0s
=> copy /context / 0.1s
=> [internal] load metadata for docker.io/flyio/stolon:b6b9aaf 4.0s
=> [internal] load metadata for docker.io/library/postgres:13.4 4.0s
=> [internal] load metadata for docker.io/wrouesnel/postgres_exporter:latest 4.0s
=> [internal] load metadata for docker.io/library/golang:1.16 4.0s
=> [stage-3 1/9] FROM docker.io/library/postgres:13.4@sha256:1adb50e5c24f550a9e68457a2ce60e9e4103dfc43c3b36e98310168165b443a1 0.0s
=> [postgres_exporter 1/1] FROM docker.io/wrouesnel/postgres_exporter:latest@sha256:54bd3ba6bc39a9da2bf382667db4dc249c96e4cfc837dafe91d6cc7d362829e0 0.0s
=> [flyutil 1/5] FROM docker.io/library/golang:1.16@sha256:e04b1665f7caf60b88c732fa3ce41e2bcf5b4320ad77f42a15d5bcda76fc4b81 0.0s
=> [stolon 1/1] FROM docker.io/flyio/stolon:b6b9aaf@sha256:ed7dfa80c26e8cdfcc3c7316c1577c1cd60d4360d8790bb22635c619a1bf8cfe 0.0s
=> CACHED [stage-3 2/9] RUN apt-get update && apt-get install --no-install-recommends -y ca-certificates curl bash dnsutils vim-tiny procps jq haproxy postgresql-13-postgis-3 postgresql-13-postgis-3-scripts && apt a 0.0s
=> CACHED [stage-3 3/9] COPY --from=stolon /go/src/app/bin/* /usr/local/bin/ 0.0s
=> CACHED [stage-3 4/9] COPY --from=postgres_exporter /postgres_exporter /usr/local/bin/ 0.0s
=> CACHED [stage-3 5/9] ADD /scripts/* /fly/ 0.0s
=> CACHED [stage-3 6/9] ADD /config/* /fly/ 0.0s
=> CACHED [stage-3 7/9] RUN useradd -ms /bin/bash stolon 0.0s
=> CACHED [stage-3 8/9] RUN mkdir -p /run/haproxy/ 0.0s
=> CACHED [flyutil 2/5] WORKDIR /go/src/github.com/fly-examples/postgres-ha 0.0s
=> CACHED [flyutil 3/5] COPY . . 0.0s
=> CACHED [flyutil 4/5] RUN CGO_ENABLED=0 GOOS=linux go build -v -o /fly/bin/flyadmin ./cmd/flyadmin 0.0s
=> CACHED [flyutil 5/5] RUN CGO_ENABLED=0 GOOS=linux go build -v -o /fly/bin/start ./cmd/start 0.0s
=> CACHED [stage-3 9/9] COPY --from=flyutil /fly/bin/* /usr/local/bin/ 0.0s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:e5ef4f2170624687ae0b46b16bcf01496404f876ae19e5a693c2436606bac116 0.0s
=> => naming to registry.fly.io/postgres-ha-example:deployment-1636395062 0.0s
--> Building image done
==> Pushing image to fly
The push refers to repository [registry.fly.io/postgres-ha-example]
245c90cee8b9: Pushed
509bbeb2db24: Pushed
16ef445c4836: Pushed
87e4fd03453f: Pushed
afeee662292a: Pushed
b5a4612ba664: Pushed
2b62d89bcaed: Pushed
c6c0c5fd9172: Pushed
9180e7a3f39f: Pushed
5ed186b83b18: Pushed
82b6874b44d0: Pushed
0739eec8bae5: Pushed
9a1a7d8bf685: Pushed
535b38c199d6: Pushed
bb5416d92e3c: Pushed
1d69f9da9d06: Pushed
651af98b41e3: Pushed
8f6d195cb042: Pushed
756e6b21e18e: Pushed
0f912f02afd0: Pushed
e8b689711f21: Mounted from metronome-sh
deployment-1636395062: digest: sha256:e6758bf6404f8397659b85ac39109e3d10e6aedd995fab8e11c6a487adf92ce9 size: 4714
--> Pushing image done
Image: registry.fly.io/postgres-ha-example:deployment-1636395062
Image size: 781 MB
==> Creating release
Release v2 created
You can detach the terminal anytime without stopping the deployment
Monitoring Deployment
1 desired, 1 placed, 0 healthy, 1 unhealthy [health checks: 3 total, 2 passing, 1 critical]
v0 failed - Failed due to unhealthy allocations
Failed Instances
==> Failure #1
Instance
ID = b01e8595
Process =
Version = 0
Region = sea
Desired = run
Status = running (leader)
Health Checks = 3 total, 2 passing, 1 critical
Restarts = 0
Created = 4m52s ago
Recent Events
TIMESTAMP TYPE MESSAGE
2021-11-08T18:12:01Z Received Task received by client
2021-11-08T18:12:01Z Task Setup Building Task Directory
2021-11-08T18:12:16Z Started Task started by client
Recent Logs
-- it repeats here several times the same thing --
2021-11-08T18:17:02.000 [info] keeper | 2021-11-08T18:17:02.269Z INFO cmd/keeper.go:1505 our db requested role is master
2021-11-08T18:17:02.000 [info] keeper | 2021-11-08T18:17:02.270Z INFO cmd/keeper.go:1543 already master
2021-11-08T18:17:02.000 [info] keeper | 2021-11-08T18:17:02.284Z INFO cmd/keeper.go:1676 postgres parameters not changed
2021-11-08T18:17:02.000 [info] keeper | 2021-11-08T18:17:02.284Z INFO cmd/keeper.go:1703 postgres hba entries not changed
***v0 failed - Failed due to unhealthy allocations and deploying as v1
Troubleshooting guide at https://fly.io/docs/getting-started/troubleshooting/
The app status on the fly dashboard says running:
But if I go to Activity:
I try to get into the db with Postico and I get a timeout:
This seems like a pretty good start, there’s a couple of things to note:
- one of the health checks is failing, can see which one with
fly checks list
- can see the logs that the VMs itself is printing with
fly logs
- the Postgres app does not expose a listener, so this app will not be accessible to the outside world on
app.fly.dev
— instead you’ll want to join the wireguard network using the notes at Private Networking and accessapp.internal
or add a public listener using the notes in Multi-region PostgreSQL
Think looking at fly checks list
and fly logs
would be the first steps, though. If the logs are clear we could remove the checks if they turn out to be unnecessary.
The checks might be failing because there’s only a single instance as well — I think the HA package assumes at least two instances, so you’ll want to add another volume and set the fly scale count 2
to run both.
Ok, I’m scaling to 2 instances but the fly checks list
in the meantime is giving me:
➜ postgres-ha git:(main) ✗ fly checks list
Health Checks for postgres-ha-example
NAME STATUS ALLOCATION REGION TYPE LAST UPDATED OUTPUT
vm passing b01e8595 sea HTTP 1m41s ago HTTP GET
http://172.19.2.50:5500/flycheck/vm:
200 OK Output: "[✓] checkDisk:
9.19 GB (94.0%) free space on
/data/ (42.41µs)\n[✓] checkLoad:
load averages: 0.00 0.01 0.02
(67.33µs)\n[✓] memory: system spent
0s of the last 60s waiting on memory
(39.25µs)\n[✓] cpu: system spent
270ms of the last 60s waiting on
cpu (30.18µs)\n[✓] io: system spent
0s of the last 60s waiting on io
(26.46µs)"
role passing b01e8595 sea HTTP 35m5s ago leader
pg critical b01e8595 sea HTTP 35m16s ago HTTP GET
http://172.19.2.50:5500/flycheck/pg:
500 Internal Server Error Output:
"failed to connect to proxy: context
deadline exceeded"
Ok, I added a second volume and scaled it to 2. still having issues: I’ll check if I can connect using WireGuard
2 desired, 2 placed, 0 healthy, 2 unhealthy
v2 failed - Failed due to unhealthy allocations
***v2 failed - Failed due to unhealthy allocations and deploying as v3
fly checks list
renders:
➜ postgres-ha git:(main) ✗ fly checks list
Health Checks for postgres-ha-example
NAME STATUS ALLOCATION REGION TYPE LAST UPDATED OUTPUT
vm passing b01e8595 sea HTTP 1m21s ago HTTP GET
http://172.19.2.50:5500/flycheck/vm:
200 OK Output: "[✓] checkDisk:
9.12 GB (93.3%) free space on
/data/ (41.98µs)\n[✓] checkLoad:
load averages: 0.00 0.01 0.02
(66.73µs)\n[✓] memory: system spent
0s of the last 60s waiting on memory
(40.26µs)\n[✓] cpu: system spent
318ms of the last 60s waiting on
cpu (30.37µs)\n[✓] io: system spent
0s of the last 60s waiting on io
(27.44µs)"
role passing b01e8595 sea HTTP 44m29s ago leader
pg critical b01e8595 sea HTTP 44m40s ago HTTP GET
http://172.19.2.50:5500/flycheck/pg:
500 Internal Server Error Output:
"failed to connect to proxy: context
deadline exceeded"
vm passing 2f2ebf1a sea HTTP 1m7s ago HTTP GET
http://172.19.1.42:5500/flycheck/vm:
200 OK Output: "[✓] checkDisk:
9.17 GB (93.8%) free space on
/data/ (46.4µs)\n[✓] checkLoad:
load averages: 0.00 0.03 0.05
(68.07µs)\n[✓] memory: system spent
0s of the last 60s waiting on memory
(40.69µs)\n[✓] cpu: system spent
318ms of the last 60s waiting on
cpu (30.01µs)\n[✓] io: system spent
12ms of the last 60s waiting on io
(47.58µs)"
role passing 2f2ebf1a sea HTTP 5m28s ago replica
pg critical 2f2ebf1a sea HTTP 5m45s ago HTTP GET
http://172.19.1.42:5500/flycheck/pg:
500 Internal Server Error Output:
"failed to connect to proxy: context
deadline exceeded"
I’m working on replicating this (getting timeouts on the Timescale keys), in the meanwhile do you have anything on fly logs
? Might be able to see what’s happening.
Nothing that explains what is going on (or I think it doesn’t)
Same thing over and over:
2021-11-08T19:58:06.170 app[2f2ebf1a] sea [info] keeper | 2021-11-08T19:58:06.170Z INFO cmd/keeper.go:1557 our db requested role is standby {"followedDB": "b89233f0"}
2021-11-08T19:58:06.170 app[2f2ebf1a] sea [info] keeper | 2021-11-08T19:58:06.170Z INFO cmd/keeper.go:1576 already standby
2021-11-08T19:58:06.189 app[2f2ebf1a] sea [info] keeper | 2021-11-08T19:58:06.189Z INFO cmd/keeper.go:1676 postgres parameters not changed
2021-11-08T19:58:06.190 app[2f2ebf1a] sea [info] keeper | 2021-11-08T19:58:06.189Z INFO cmd/keeper.go:1703 postgres hba entries not changed
2021-11-08T19:58:07.174 app[b01e8595] sea [info] keeper | 2021-11-08T19:58:07.174Z INFO cmd/keeper.go:1505 our db requested role is master
2021-11-08T19:58:07.175 app[b01e8595] sea [info] keeper | 2021-11-08T19:58:07.175Z INFO cmd/keeper.go:1543 already master
2021-11-08T19:58:07.188 app[b01e8595] sea [info] keeper | 2021-11-08T19:58:07.188Z INFO cmd/keeper.go:1676 postgres parameters not changed
2021-11-08T19:58:07.189 app[b01e8595] sea [info] keeper | 2021-11-08T19:58:07.188Z INFO cmd/keeper.go:1703 postgres hba entries not changed
Do you have any issues installing the main repo? GitHub - fly-apps/postgres-ha: Postgres + Stolon for HA clusters as Fly apps.
This is what is giving me errors. No timescaledb stuff yet.
I’m trying to work through this whole process once… there’s a couple of extra calls to make to get the HA system running, will report back as soon as I can figure them out.
Thanks @sudhir.j you’re helping a lot!
I’d actually suggest tweaking the approach a bit - if you create a normal Fly PG HA setup using fly pg create
, you can then modify the Postgres-ha repo to add timescale and just redeploy it (Fly PG apps are just Fly apps with extra initialisation). That way all the setup is already done for you.
There’s a walkthrough on adding timescale to an existing DB here How to Enable TimescaleDB on an Existing PostgreSQL Database | Severalnines