I’ve been running a machine for quite some time. Got an email the other day about “Some of your apps in AMS are on a host which has suffered a hardware failure and will be down for an extended period.”
Then a couple of days ago I got this new email that it won’t recover with a link how to get up again and how to reuse a volume by creating a new volume based on snapshots from old volume
So I try to follow “Apps with one Machine and an attached volume” and it say to
- fly volumes list
- fly volumes snapshots list
- fly volumes create --snapshot-id -s
- fly scale count 1
So I did all that and at step 3 it creates a volume and at step 4 it deploys alright but when deploying it also creates a new volume rather than using the one I created in step 3
How can I force my app to use the volume I create from a snapshot?
Some of the commands where formatting mangled when pasting but you still get it? (they were just copy pasted from the linked page)
I have exactly the same problem, I followed all the steps and restored form the snapshot but the issue is in the instance from the snapshot it’s the new volume and not the previous I was using meaning that the database is empty in my case. Any insight on how to solve this issue?
I was assuming if you restore from the snapshot it should be a replica of the original and just detach old instance and attach new?
Hi… Sorry you’re having so much trouble with this… Volumes are tied to specific regions, which may be the source of the problem.
Perhaps you could post your fly.toml
, particularly the primary_region
, as well as the outputs of the fly volumes list
and fly m list
commands?
Aside: You can use triple-backticks (```
) to avoid formatting problems, . E.g.,
```text
$ fly volumes list
$ fly m list
```
The above would come out as…
$ fly volumes list
$ fly m list
Basically, I’m unable to attach this volume to any pg app. If I create a new pg app from the snapshot of this volume there is no previous data just an empty database.
----
ID STATE NAME SIZE REGION ZONE ENCRYPTED ATTACHED VM CREATED AT
vol_g67340kjpj2vydxw* created pg_data 20GB ams f6b7 true 1 year ago
* These volumes' hosts could not be reached.
Tried clone, fork, and attach all the options, and I always got the same outcome. Btw due to irreparable hardware damage
all my machines were destroyed.
Yeah, these cases are pretty harsh. Sorry you ran into that!
I’m mainly asking about the new machines and new volumes that were created subsequently, though, . There are many small things that could have slipped out of alignment, at this point.
The first part is true. However, the second is not necessarily so simple, with Postgres.
Could you try creating a new PG machine from a snapshot, again?
We can start by verifying the machines and volumes lists and then looking inside with fly ssh console
…
(Also, it would help if you could report the exact, entire invocations that were used, since the details really do matter here.)
Sure, so it goes like this:
My machines are destroyed for pg app ht-db-cluster
, volume is there but shows usage 0/20GB so it’s empty, so I want to restore it from snapshot like this:
fly volumes list -a ht-db-cluster
ID STATE NAME SIZE REGION ZONE ENCRYPTED ATTACHED VM CREATED AT
vol_g67340kjpj2vydxw* created pg_data 20GB ams f6b7 true 1 year ago
fly volumes snapshots list vol_g67340kjpj2vydxw
ID STATUS SIZE CREATED AT RETENTION DAYS
vs_4B2RqvpBKYVack7MVZ created 611561223 3 days ago 60
vs_4B2RqvpBKYVacNa9Z created 611561222 4 days ago 60
vs_zGQ7mg3GL4KaHMDm62 created 611561222 5 days ago 60
vs_91XqnLQ1mzwlh74ngl created 611561230 6 days ago 60
vs_RjVJ4y3jZGQPU1kynp created 611561228 1 week ago 60
vs_P273e8M2Z5BycyJp0R created 611561229 1 week ago 60
fly postgres create --snapshot-id vs_4B2RqvpBKYVack7MVZ -r ams --image-ref flyio/postgres:14.6
And I get the connection string for the new instance and connect from client but the database is empty even if it shows usage of 1+gb
fly m list -a shy-river-7114
ID NAME STATE CHECKS REGION ROLE IMAGE IP ADDRESS VOLUME CREATED LAST UPDATED APP PLATFORM PROCESS GROUP SIZE
185e649c492498 holy-waterfall-8460 started 3/3 ams leader flyio/postgres:14.6 (v0.0.41) fdaa:0:4f49:a7b:3e:fce6:a664:2 vol_vlp976oe2395gzp4 2024-08-11T09:51:36Z 2024-08-11T09:57:02Z v2 shared-cpu-2x:4096MB
----
fly vol list -a shy-river-7114
ID STATE NAME SIZE REGION ZONE ENCRYPTED ATTACHED VM CREATED AT
vol_vlp976oe2395gzp4 created pg_data 20GB ams 95a6 true 185e649c492498 35 minutes ago
These are all steps taken from my side, and what ever I do db is always empty on new app even the volume shows usage.
Thanks for the assistance, I’m in a real need of the data that was in the db.
1 Like
Hello, I received the “irreparable hardware damage” email too and I am in the same situation.
I have 60days snapshot retention for my postgresql app:
fly volumes snapshot list vol_ez1nvxw19pzrmxl7
ID STATUS SIZE CREATED AT RETENTION DAYS
vs_KDn1PAKqB9vvcZB8GP created 116154277 2 days ago 60
vs_KDn1PAKqB9vvcyOkeP created 116142405 3 days ago 60
vs_A54JgYOkXQjjCgJanR created 116137943 4 days ago 60
vs_Zk9ZYNK3e2wwsN2YV created 116120776 5 days ago 60
vs_M9ZwBRKqmlvvFL2k6j created 116120776 6 days ago 60
vs_8YzNKg5BqMXXTe3DGx created 116120776 1 week ago 60
...
vs_M9ZwBRKqmlvvFZmvb8 created 116120777 1 month ago 60
vs_qw1B5z3VAoeeilOzv created 116120777 1 month ago 60
vs_lA8p5BJMX9NNhMmmOk created 116120777 1 month ago 60
but if I try to create a new postgres app from any of the snapshots
fly postgres create --snapshot-id vs_A54JgYOkXQjjCgJanR --image-ref flyio/postgres:13
the database is empty beside the default tables:
postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+------------+----------+------------+------------+---------------------------
postgres | flypgadmin | UTF8 | en_US.utf8 | en_US.utf8 |
template0 | flypgadmin | UTF8 | en_US.utf8 | en_US.utf8 | =c/flypgadmin +
| | | | | flypgadmin=CTc/flypgadmin
template1 | flypgadmin | UTF8 | en_US.utf8 | en_US.utf8 | =c/flypgadmin +
| | | | | flypgadmin=CTc/flypgadmin
Am I doing something wrong or do the snapshots really contain no updated data?
2 Likes
Thanks for all the details! It looks like this one might just be a mismatch with the connection string…
What do you see from the following?
$ fly pg db list -a shy-river-7114
(This is another way of doing @Tommaso’s \l
.)
This is the output
fly pg db list -a shy-river-7114
----
NAME USERS
postgres flypgadmin, postgres, repluser
Hm… The 1GB usage strongly suggests that your data is there, somewhere, but it looks like we’re going to have to do more work to find it…
(A fresh PG volume is more like 100MB.)
$ fly ssh console -C 'df -h' -a shy-river-7114
(@Tommaso, you might want to try this on your end, as well.)
This will both confirm sizes and (possibly) suggest mountpoint discrepancies…
fly ssh console -C 'df -h' -a shy-river-7114
----
Connecting complete
Filesystem Size Used Avail Use% Mounted on
devtmpfs 2.0G 0 2.0G 0% /dev
none 7.8G 8.4M 7.4G 1% /
/dev/vdb 7.8G 8.4M 7.4G 1% /.fly-upper-layer
shm 2.0G 76K 2.0G 1% /dev/shm
tmpfs 2.0G 0 2.0G 0% /sys/fs/cgroup
/dev/vdc 20G 161M 19G 1% /data
1 Like
This is what I get running that command
Filesystem Size Used Avail Use% Mounted on
devtmpfs 94M 0 94M 0% /dev
none 7.8G 8.4M 7.4G 1% /
/dev/vdb 7.8G 8.4M 7.4G 1% /.fly-upper-layer
shm 107M 44K 107M 1% /dev/shm
tmpfs 107M 0 107M 0% /sys/fs/cgroup
/dev/vdc 986M 152M 767M 17% /data
Thank you @mayailurus for the help
1 Like
I’m just wondering if the data is encrypted on the volume, can we even use it without a machine that was used to encrypt, since it got destroyed? Data might be there but locked by another process or something
I believe those keys are centrally managed, but that was good thinking, overall…
The discrepancy between the internally reported 161M
usage and the 1,259 MB
in the screenshot is giving me pause, here. That might just be the difference between the filesystem view and the block-device view, though.
Did this database have a lot of writes and deletes in the past?
Also, does 1GB sound roughly correct to you in terms of plausible, total data?
Multiples of things have definitely been a problem in the past… (We ruled out one classic situation already, with \l
.)
Here’s another that is sometimes useful:
$ fly ssh console -C 'find / -name PG_VERSION' -a shy-river-7114
This will detect multiple PG clusters on the same machine.
(A normal PG volume will have multiple hits here, too, so it takes some interpretation.)
Not sure, it might be also like between 50mb and 200mb, in this db were some e-commerce orders and receipts stored and I did not backed up it manually counting on the backup of the system. Also strapi CMS installation and configuration records.
I’m a bit lost here now
/data/postgresql/PG_VERSION
/data/postgresql/base/1/PG_VERSION
/data/postgresql/base/16514/PG_VERSION
/data/postgresql/base/16513/PG_VERSION
/data/postgresql/base/5/PG_VERSION
/data/postgresql/base/4/PG_VERSION
/data/postgresql/base/16386/PG_VERSION
/data/postgres/PG_VERSION
/data/postgres/base/1/PG_VERSION
/data/postgres/base/13757/PG_VERSION
/data/postgres/base/13756/PG_VERSION
1 Like
Ah! This is multiple something, right?
Both postgres
and postgresql
(with the ql
suffix).
Odds are good that the larger one is your old data.
(I made a completely new Stolon PG cluster for comparison, and it only had /data/postgres/
.)
Try the following:
$ fly ssh console -C 'ls -l /data/postgresql/PG_VERSION /data/postgres/PG_VERSION' -a shy-river-7114
I tried your latest command @mayailurus the only thing I notice is that the second row has a date possibly related to when I initially created the database:
-rw------- 1 stolon stolon 3 Aug 11 11:01 /data/postgres/PG_VERSION
-rw------- 1 stolon stolon 3 Jul 12 2023 /data/postgresql/PG_VERSION
Same here, it might be the image that I’m using, it’s 14.6, maybe pg 13 stores data in a diferent path
----
Connecting complete
-rw------- 1 stolon stolon 3 Aug 11 09:57 /data/postgres/PG_VERSION
-rw------- 1 stolon stolon 3 Jun 13 2023 /data/postgresql/PG_VERSION