About why I try to update image. After this problem still in Deployment, I can’t connect database from the web app, it says host not found. So I tried redeploying the db to solve the problem. As a result, the problem becomes more complicated.
App
Name = geeknote-postgres
Owner = geeknote-net
Version = 6
Status = pending
Hostname = geeknote-postgres.fly.dev
Deployment Status
ID = 6b1cd140-70f2-e66d-0d66-91c2431a3a4f
Version = v6
Status = running
Description = Deployment is running
Instances = 1 desired, 0 placed, 0 healthy, 0 unhealthy
Instances
ID PROCESS VERSION REGION DESIRED STATUS HEALTH CHECKS RESTARTS CREATED
scale down to shared-cpu-1x , instance is created but stuck in pending state:
App
Name = geeknote-postgres
Owner = geeknote-net
Version = 12
Status = running
Hostname = geeknote-postgres.fly.dev
Deployment Status
ID = da7927a7-780d-f2af-2415-036a95bb0308
Version = v12
Status = running
Description = Deployment is running
Instances = 1 desired, 1 placed, 0 healthy, 0 unhealthy
Instances
ID PROCESS VERSION REGION DESIRED STATUS HEALTH CHECKS RESTARTS CREATED
9f20a649 app 12 ⇡ hkg run pending 0 2m11s ago
6a9a81a3 app 11 hkg stop pending 0 10m20s ago
377b3d9f app 10 hkg stop pending 0 12m4s ago
142e92c3 app 9 hkg stop pending 0 17m32s ago
Hey there, can you share the logs output from flyctl logs so we can see if there’s any valuable information about what’s going on there, please?
We did bring up more server capacity in hkg, but that was about a day ago and it seems you’re still experiencing issues with your app in pending status, so we need to make sure there’s nothing else going on.
Alright, thank you so much for that!
It turns out we’re actually experiencing some hardware issues in hkg, you can check the status page for updates as we work on this issue.
This is the third time I have encountered the problem of resource exhaustion of hkg, and two of them caused the server to go down. Such problems should be discovered in advance through monitoring.
Consider introducing a support ticket system. The advantage of the community is that you can view everyone’s problems and contact developers, but when you encounter an urgent problem, not knowing when it will be dealt with will make me panic.
Past experience does not reassure me about recommending fly.io for production applications. Hope fly.io improves stability and customer support. thanks again.
Glad you’re up and running again!
We do have monitoring systems in place, this was actually a new bug that our monitoring didn’t detect properly. We add additional checks when this type of situation comes up.
We do recommend that you run 2+ instances of your database that need high availability. In this case your app would have stayed online even with one instance failing with that kind of setup. This just offers you a fail-safe from these types of hardware failures.
We have both a “dev” and “prod” deployment of flyio/redis:6.2.6 in our environment.
They were also at the shared-cpu-1x VM size with 256MB memory.
And this, morning, they were both stuck in “pending”, in which the app logs showed:
2022-07-12T12:42:30Z runner[f0d06629] iad [info]Shutting down virtual machine
2022-07-12T12:42:30Z app[f0d06629] iad [info]Sending signal SIGINT to main child process w/ PID 523
2022-07-12T12:42:30Z app[f0d06629] iad [info]redis | Interrupting...
2022-07-12T12:42:30Z app[f0d06629] iad [info]metrics | Interrupting...
2022-07-12T12:42:30Z app[f0d06629] iad [info]redis | 538:signal-handler (1657629750) Received SIGINT scheduling shutdown...
2022-07-12T12:42:30Z app[f0d06629] iad [info]metrics | signal: interrupt
2022-07-12T12:42:30Z app[f0d06629] iad [info]redis | 538:M 12 Jul 2022 12:42:30.259 * DB saved on disk
2022-07-12T12:42:30Z app[f0d06629] iad [info]redis | 538:M 12 Jul 2022 12:42:30.259 # Redis is now ready to exit, bye bye...
2022-07-12T12:42:30Z app[f0d06629] iad [info]redis | signal: interrupt
2022-07-12T12:42:31Z app[f0d06629] iad [info]Main child exited normally with code: 0
2022-07-12T12:42:31Z app[f0d06629] iad [info]Starting clean up.
2022-07-12T12:42:31Z app[f0d06629] iad [info]Umounting /dev/vdc from /data
I couldn’t get it restarted, trying to scale count to 0, back to 1, and trying new deployment attempts all failed…
To back up the above comments, doing a fly scale vm dedicated-cpu-1x --app got those redis instances back up and running.