Hi, I have Rails app and a postgres app. The apps were working fine during the end of January, and I don’t recall making any changes after about the 26th, but from the start of February, the postgres app is no longer working.
The postgres app will not boot and just gets stuck on “pending” during releases. I have tried scaling the pg app down and back up to restart it, or redeployed, but no luck.
Looking into it deeper, I think that the driver failure
is the cause of this:
flyctl status instance <Redacted> -a swedishbirds-db
Instance
ID = <Redacted>
Process = app
Version = 8
Region = lhr
Desired = stop
Status = failed
Health Checks = 3 total, 3 passing
Restarts = 5
Created = 2023-01-25T15:01:42Z
Recent Events
TIMESTAMP TYPE MESSAGE
2023-01-25T15:01:27Z Received Task received by client
2023-01-25T15:01:27Z Task Setup Building Task Directory
2023-01-25T15:02:09Z Started Task started by client
2023-01-26T19:55:47Z Terminated Exit Code: 2
2023-01-26T19:55:48Z Restarting Task restarting in 1.084839554s
2023-01-26T19:55:54Z Started Task started by client
2023-01-26T21:13:44Z Terminated Exit Code: 2
2023-01-26T21:13:44Z Restarting Task restarting in 1.206766527s
2023-01-26T21:13:50Z Started Task started by client
2023-02-03T02:54:20Z Terminated Exit Code: 2
2023-02-03T02:54:20Z Restarting Task restarting in 1.063800869s
2023-02-03T02:54:26Z Started Task started by client
2023-02-03T17:59:46Z Terminated Exit Code: 2
2023-02-03T17:59:47Z Restarting Task restarting in 1.203652444s
2023-02-03T17:59:53Z Started Task started by client
2023-02-03T18:12:48Z Terminated Exit Code: 2
2023-02-03T18:12:48Z Restarting Task restarting in 1.080633031s
2023-02-03T18:12:55Z Driver Failure rpc error: code = Unknown desc = unable to create microvm: could not find device for volume with name pg_data
2023-02-03T18:12:55Z Not Restarting Error was unrecoverable
Checks
ID SERVICE STATE OUTPUT
pg app passing HTTP GET http://172.19.64.154:5500/flycheck/pg: 200 OK Output: [✓] transactions: read/write (216.81µs)
[✓] connections: 11 used, 3 reserved, 300 max (3.49ms)
vm app passing HTTP GET http://172.19.64.154:5500/flycheck/vm: 200 OK Output: [✓] checkDisk: 799.05 MB (81.9%) free space on /data/ (32.41µs)
[✓] checkLoad: load averages: 0.09 0.20 0.25 (47.19µs)
[✓] memory: system spent 0s of the last 60s waiting on memory (27.61µs)
[✓] cpu: system spent 2.27s of the last 60s waiting on cpu (16.02µs)
[✓] io: system spent 0s of the last 60s waiting on io (13.82µs)
role app passing leader
Here’s my remote config for the postgres app:
{
"checks": {
"pg": {
"grace_period": "30s",
"headers": [],
"interval": "15s",
"method": "get",
"path": "/flycheck/pg",
"port": 5500,
"protocol": "http",
"restart_limit": 0,
"timeout": "10s",
"tls_skip_verify": false,
"type": "http"
},
"role": {
"grace_period": "30s",
"headers": [],
"interval": "15s",
"method": "get",
"path": "/flycheck/role",
"port": 5500,
"protocol": "http",
"restart_limit": 0,
"timeout": "10s",
"tls_skip_verify": false,
"type": "http"
},
"vm": {
"grace_period": "1s",
"headers": [],
"interval": "1m",
"method": "get",
"path": "/flycheck/vm",
"port": 5500,
"protocol": "http",
"restart_limit": 0,
"timeout": "10s",
"tls_skip_verify": false,
"type": "http"
}
},
"env": {
"PRIMARY_REGION": "lhr"
},
"experimental": {
"auto_rollback": false,
"enable_consul": true,
"private_network": true
},
"kill_signal": "SIGTERM",
"kill_timeout": 300,
"metrics": {
"path": "/metrics",
"port": 9187
},
"mounts": [
{
"destination": "/data",
"encrypted": false,
"source": "pg_data"
}
],
"processes": [],
"services": []
}
I believe this is the unchanged default config was generated.
I see I do still have a volume called “pg_data” so I am unsure why it could not find it, and if it’s relevant and the cause to the never-ending pending deployment I am seeing.