Musto
February 24, 2023, 7:31am
1
This week my first db stopped working, when i try to do some “upgrade” or “restart” on the db, i got this error :
Error no active leader found
Then i tried to create a new db from a snapshot of the first db. But i got this error when i started to deploy an image to get external connection :
2023-02-24T07:21:52.649 app[148ed51b739328] cdg [info] proxy | [WARNING] 054/072152 (2971) : parsing [/fly/haproxy.cfg:38]: Missing LF on last line, file might have been truncated at position 96. This will become a hard error in HAProxy 2.3.
2023-02-24T07:21:52.649 app[148ed51b739328] cdg [info] proxy | [ALERT] 054/072152 (2971) : Error(s) found in configuration file : /fly/haproxy.cfg
2023-02-24T07:21:52.650 app[148ed51b739328] cdg [info] proxy | [ALERT] 054/072152 (2971) : Fatal errors found in configuration.
2023-02-24T07:21:52.651 app[148ed51b739328] cdg [info] proxy | exit status 1
2023-02-24T07:21:52.651 app[148ed51b739328] cdg [info] proxy | restarting in 1s [attempt 379]
2023-02-24T07:21:53.653 app[148ed51b739328] cdg [info] proxy | Running...
2023-02-24T07:21:53.663 app[148ed51b739328] cdg [info] proxy | exit status 1
2023-02-24T07:21:53.663 app[148ed51b739328] cdg [info] proxy | restarting in 1s [attempt 380]
2023-02-24T07:21:54.664 app[148ed51b739328] cdg [info] proxy | Running...
2023-02-24T07:21:54.683 app[148ed51b739328] cdg [info] proxy | [NOTICE] 054/072154 (2991) : haproxy version is 2.2.9-2+deb11u3
2023-02-24T07:21:54.683 app[148ed51b739328] cdg [info] proxy | [NOTICE] 054/072154 (2991) : path to executable is /usr/sbin/haproxy
2023-02-24T07:21:54.683 app[148ed51b739328] cdg [info] proxy | [ALERT] 054/072154 (2991) : parsing [/fly/haproxy.cfg:37] : Can't create DNS resolution for server '(null)'
Does anyone have some piece of information to solve the problem of the first or the second db ?
1 Like
Is it possible to check if restores to this newer flex
Postgres Fly app works?
# ref: fly.io/docs/postgres/managing/backup-and-restore
flyctl pg create --snapshot-id <sid> -a <app-name> --flex
1 Like
shaun
February 24, 2023, 1:39pm
3
flyctl pg create --snapshot-id <sid> -a <app-name> --flex
This unfortunately will not work. You will need to perform a pg dump/restore to move to flex as of right now. We are actively working to make this transition process easier for users.
2 Likes
Musto
February 25, 2023, 10:49am
4
flyctl status --all -a mydb-app
ID STATE ROLE REGION HEALTH CHECKS IMAGE CREATED UPDATED
9080291c6d3dd8 started error cdg 3 total flyio/postgres:14.6 (v0.0.34) 2023-01-08T12:44:20Z 2023-02-19T12:45:56Z
doesn’t work (unk,ow, shorthand flag: ‘a’ in -a
But when i do the first pg create of my snapshot with : fly postgres create --snapshot-id “snapshot-id”
it was successfull :
... Choose app name, organization...
Waiting for 3d8d463c765389 to become healthy (started, 3/3)
1 Like
Musto
February 25, 2023, 10:51am
5
flyctl scale count 1 -a app_name
Error it looks like your app is running on v2 of our platform, and does not support this legacy command```
shaun
February 25, 2023, 4:32pm
6
@Musto
I took a look at your app and looks like your standby does not have a volume attached to it. I would remove that machine using:
fly machines stop <machine-id> --app <app-name>
fly machines remove <machine-id> --app <app-name>
Then create a new one using:
fly machines clone <primarys-machine-id>
Let me know how that goes.
2 Likes
Musto
February 25, 2023, 7:58pm
7
Hey thanks for your help !
fly machines stop <908..(machine-id)> -a <app-name>
//Success
fly machines remove <908..(machine-id)> -a <app-name>
//Success
fly machines clone <908..(machine-id)>
//Could not find app
fly machines clone<908..(machine-id)> -a <app-name>
//Success
/*
Cloning machine <908..(machine-id)> into region cdg
Volume 'pg_data' will start empty
Provisioning a new machine with image registry-1.docker.io/flyio...
Machine <148(new-machine-id)> has been created
Waiting for start and to become healty... (1/3)
Machine has been successfully cloned!
*/
On my app (after restart) :
strapi start
2023-02-25T19:54:09.626 app[84..] cdg [info] [2023-02-25 19:54:09.624] debug: ⛔️ Server wasn't able to start properly.
2023-02-25T19:54:09.627 app[84..] cdg [info] [2023-02-25 19:54:09.626] error: Connection terminated unexpectedly
2023-02-25T19:54:09.627 app[84..] cdg [info] Error: Connection terminated unexpectedly
On my db logs :
exporter | INFO[0843] Established new database connection to "fdaa:...". source="postgres_exporter.go:970"
2023-02-25T19:54:57.048 app[148..] cdg [info] exporter | ERRO[0844] Error opening connection to database (postgresql://flypgadmin:PASS@[fdaa:...]:5433/postgres?sslmode=disable): dial tcp [fdaa:...]:5433: connect: connection refused source="postgres_exporter.go:1658"
2023-02-25T19:54:58.151 app[148..] cdg [info] sentinel | 2023-02-25T19:54:58.151Z WARN cmd/sentinel.go:276 no keeper info available {"db": "eb...", "keeper": "5ad..."}
2023-02-25T19:54:58.155 app[148..] cdg [info] sentinel | 2023-02-25T19:54:58.155Z ERROR cmd/sentinel.go:1018 no eligible masters
shaun
February 25, 2023, 8:07pm
8
Strange… Looks like the volume didn’t get created.
I went ahead and created a new volume for you:
fly volumes create pg_data --size 1 --region cdg
You can view you volumes by running:
fly volumes list
Then I performed the clone command as follows:
fly machines clone <primary-machine-id> --attach-volume <new-volume-id>
That seemed to do the trick.
1 Like
Musto
February 25, 2023, 8:26pm
9
Be careful, i was modifying the first db (portfo…-api-m…-db) and not the new one (m…-db)
Is it easier to change my app connection to the new db or to solve the problem with the first db ?
shaun
February 25, 2023, 8:36pm
10
Is it easier to change my app connection to the new db or to solve the problem with the first db ?
You should be able to fix your api db by doing something like:
fly volumes list --app <app-name>
Take note of the volume that doesn’t have an attached VM. This holds your primary’s data.
fly machines clone <existing-machine-id> --attach-volume <unallocated-volume-id> --app <app-name>
Out of curiosity, which version of flyctl
are you running?
1 Like
shaun
February 25, 2023, 8:44pm
11
If you don’t have any data in these dbs yet, I would update your flyctl
version and re-provision them. You’ll then be on the latest implementation of our Postgres offering:
What’s this about?
Over the last year, we have been seeing more and more Postgres apps go down due to unstable connections with our multi-tenant Consul service. Stolon, the open-source solution that we have been using for HA management requires an always-stable connection with our Consul. The issue is when that connection becomes unstable, PG’s start becoming unaccessible. We have been pretty disappointed with how this has impacting our users and decided it was time to try a new approach.
Th…
1 Like
Musto
February 25, 2023, 9:01pm
12
Flyctl version : v0.0.464
I executed the clone of my existing machine with the unallocated volume :
Machine has been successfully cloned!
And it works !
Thank you very much !
Do you know what was the problem? It works perfectly fine for weeks and suddenly this problem appears. It will help many people I think.
Musto
February 25, 2023, 9:03pm
13
My db have a lot of essential data, but thank you for the second option
shaun
February 25, 2023, 9:07pm
14
Hard to say. If you ever experience anything weird though, the best first step is to make sure you’re running the latest version of flyctl
. If the weirdness still exists after the upgrade, then it’s at least easier for us to troubleshoot.
Happy to hear things are back in order though!
1 Like