i don’t know why show this Error App is unchanged, skipping release
when fly deploy
@jsierles please help, how to restart ?
Hi @bogordesain You’re probably aware, but this is likely due to this now-resolved incident Stuck on Running release task (pending)... - #4 by catflydotio
@catflydotio I’m still having this issue in YYZ, stuck on pending, logs shows “[info] Unpacking image” and stuck there.
I tried deleting the app and relaunching it several times, but still the same issue. (been experiencing this since last night)
could this be caused by a separate issue than the incident earlier today?
Thanks for letting us know! Since the API incident has settled down, we can definitely help dig into things a bit. If your app instances are pending, then we know that they’ve already been scheduled. So the issue would be with running the image on our VM.
It could help to check a couple of things:
-
fly status --all
: can show you things like crashing containers -
fly logs
(it seems like you’ve already done this) will show you errors emitted by your app or its VM.
Posting that output here will help Fly pinpoint the problem more readily, and help the community give advice.
In this case it seems like the problem could be specific to your image. How are you deploying it to us? I’m guessing, too, that the container starts normally when you run it locally. What happens when you try a local deployment (ie, launching with fly launch --no-deploy ; fly deploy --local-only
?
Hi, here’s the status and logs, I did try fly launch --no-deploy and fly deploy --local-only, with the no deploy, it launched the new app as expected waiting to be deployed (pending), the fly deploy --local-only built the image properly and was stuck on the deploy,
here are the logs (i obfuscated my app name, hopefully the ids are enough for you to locate! if not please let me know I can maybe PM you?). Thank you!
also, for all intents and purposes, the app code has not been changed since that last successful deploy sometime yesterday.
thanks in advance!
% fly status --all
App
Name = <obfuscated>
Owner = personal
Version = 1
Status = running
Hostname = <obfuscated>.fly.dev
Platform = nomad
Deployment Status
ID = 05b93c5e-a279-722d-4d61-32bde52ce12c
Version = v1
Status = failed
Description = Failed due to unhealthy allocations - no stable job version to auto revert to
Instances = 1 desired, 1 placed, 0 healthy, 1 unhealthy
Instances
ID PROCESS VERSION REGION DESIRED STATUS HEALTH CHECKS RESTARTS CREATED
cb3e86bc app 1 yyz run pending 0 1h47m ago
2022-09-13T16:05:28Z runner[cb3e86bc] yyz [info]Starting instance
2022-09-13T16:05:28Z runner[cb3e86bc] yyz [info]Configuring virtual machine
2022-09-13T16:05:28Z runner[cb3e86bc] yyz [info]Pulling container image
2022-09-13T16:05:29Z runner[cb3e86bc] yyz [info]Unpacking image
2022-09-13T17:52:53Z runner[cb3e86bc] yyz [info]Starting instance
2022-09-13T17:52:54Z runner[cb3e86bc] yyz [info]Configuring virtual machine
2022-09-13T17:52:54Z runner[cb3e86bc] yyz [info]Pulling container image
2022-09-13T17:52:55Z runner[cb3e86bc] yyz [info]Unpacking image
@arcreactor7 There’s something going on on that host. It’s getting looked at now.
@arcreactor7 Can you try redeploying? Should be good to go.
I’m getting the following errors now repeated (when i try to deploy), i even tried the delete app and relaunch a new one, but still same errors… thanks for looking into this!
1 desired, 1 placed, 0 healthy, 1 unhealthy
Failed Instances
Failure #1
Instance
ID PROCESS VERSION REGION DESIRED STATUS HEALTH CHECKS RESTARTS CREATED
16b43162 app 0 yyz run failed 0 2s ago
Recent Events
TIMESTAMP TYPE MESSAGE
2022-09-13T18:38:08Z Received Task received by client
2022-09-13T18:38:08Z Task Setup Building Task Directory
2022-09-13T18:38:13Z Driver Failure rpc error: code = Unknown desc = unable to create microvm: error pulling image: unknown
github.com/containerd/containerd/errdefs.init
/go/pkg/mod/github.com/containerd/containerd@v1.4.1/errdefs/errors.go:43
runtime.doInit
/usr/local/go/src/runtime/proc.go:6222
runtime.doInit
/usr/local/go/src/runtime/proc.go:6199
runtime.doInit
/usr/local/go/src/runtime/proc.go:6199
runtime.doInit
/usr/local/go/src/runtime/proc.go:6199
runtime.doInit
/usr/local/go/src/runtime/proc.go:6199
runtime.main
/usr/local/go/src/runtime/proc.go:233
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1571
failed to activate new thin device "data_0-nomadfc_layers-snap-127746" (dev: 1189): device or resource busy
github.com/containerd/containerd/errdefs.FromGRPC
/go/pkg/mod/github.com/containerd/containerd@v1.4.1/errdefs/grpc.go:107
github.com/containerd/containerd/snapshots/proxy.(*proxySnapshotter).Prepare
/go/pkg/mod/github.com/containerd/containerd@v1.4.1/snapshots/proxy/proxy.go:108
github.com/containerd/containerd/rootfs.applyLayers
/go/pkg/mod/github.com/containerd/containerd@v1.4.1/rootfs/apply.go:129
github.com/containerd/containerd/rootfs.ApplyLayerWithOpts
/go/pkg/mod/github.com/containerd/containerd@v1.4.1/rootfs/apply.go:102
github.com/containerd/containerd.(*image).Unpack
/go/pkg/mod/github.com/containerd/containerd@v1.4.1/image.go:324
github.com/superfly/nomad-firecracker/driver.pullImage
/app/driver/machine.go:168
github.com/superfly/nomad-firecracker/driver.NewMachine.func3
/app/driver/machine.go:549
github.com/superfly/nomad-firecracker/driver.NewMachine
/app/driver/machine.go:1141
github.com/superfly/nomad-firecracker/driver.(*Driver).StartTask
/app/driver/driver.go:988
github.com/hashicorp/nomad/plugins/drivers.(*driverPluginServer).StartTask
/go/pkg/mod/github.com/hashicorp/nomad@v0.12.0/plugins/drivers/server.go:105
github.com/hashicorp/nomad/plugins/drivers/proto._Driver_StartTask_Handler
/go/pkg/mod/github.com/hashicorp/nomad@v0.12.0/plugins/drivers/proto/driver.pb.go:4260
google.golang.org/grpc.(*Server).processUnaryRPC
/go/pkg/mod/google.golang.org/grpc@v1.44.0/server.go:1282
google.golang.org/grpc.(*Server).handleStream
/go/pkg/mod/google.golang.org/grpc@v1.44.0/server.go:1616
google.golang.org/grpc.(*Server).serveStreams.func1.2
/go/pkg/mod/google.golang.org/grpc@v1.44.0/server.go:921
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1571
failed to prepare extraction snapshot "extract-948174911-gpLw sha256:0e474b185faf5e6625668cae1c08e63d80ba49f825ed0579eeba428b35ccfcf8"
github.com/containerd/containerd/rootfs.applyLayers
/go/pkg/mod/github.com/containerd/containerd@v1.4.1/rootfs/apply.go:146
github.com/containerd/containerd/rootfs.ApplyLayerWithOpts
/go/pkg/mod/github.com/containerd/containerd@v1.4.1/rootfs/apply.go:102
github.com/containerd/containerd.(*image).Unpack
/go/pkg/mod/github.com/containerd/containerd@v1.4.1/image.go:324
github.com/superfly/nomad-firecracker/driver.pullImage
/app/driver/machine.go:168
github.com/superfly/nomad-firecracker/driver.NewMachine.func3
/app/driver/machine.go:549
github.com/superfly/nomad-firecracker/driver.NewMachine
/app/driver/machine.go:1141
github.com/superfly/nomad-firecracker/driver.(*Driver).StartTask
/app/driver/driver.go:988
github.com/hashicorp/nomad/plugins/drivers.(*driverPluginServer).StartTask
/go/pkg/mod/github.com/hashicorp/nomad@v0.12.0/plugins/drivers/server.go:105
github.com/hashicorp/nomad/plugins/drivers/proto._Driver_StartTask_Handler
/go/pkg/mod/github.com/hashicorp/nomad@v0.12.0/plugins/drivers/proto/driver.pb.go:4260
google.golang.org/grpc.(*Server).processUnaryRPC
/go/pkg/mod/google.golang.org/grpc@v1.44.0/server.go:1282
google.golang.org/grpc.(*Server).handleStream
/go/pkg/mod/google.golang.org/grpc@v1.44.0/server.go:1616
google.golang.org/grpc.(*Server).serveStreams.func1.2
/go/pkg/mod/google.golang.org/grpc@v1.44.0/server.go:921
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1571
2022-09-13T18:38:13Z Not Restarting Error was unrecoverable
2022-09-13T18:38:13Z Alloc Unhealthy Unhealthy because of failed task
2022-09-13T18:38:14Z Killing Sent interrupt. Waiting 5s before force killing
2022-09-13T18:38:09Z [info]Starting instance
2022-09-13T18:38:09Z [info]Configuring virtual machine
2022-09-13T18:38:09Z [info]Pulling container image
2022-09-13T18:38:10Z [info]Unpacking image
2022-09-13T18:38:11Z [info]Pull failed, retrying (attempt #0)
2022-09-13T18:38:11Z [info]Unpacking image
2022-09-13T18:38:12Z [info]Pull failed, retrying (attempt #1)
2022-09-13T18:38:12Z [info]Unpacking image
2022-09-13T18:38:13Z [info]Pull failed, retrying (attempt #2)
2022-09-13T18:38:13Z [info]Pulling image failed
--> v0 failed - Failed due to unhealthy allocations - no stable job version to auto revert to and deploying as v1
--> Troubleshooting guide at https://fly.io/docs/getting-started/troubleshooting/
Error abort
can’t even delete the app now, it’s timing out please help, thank you!
I see this one host is alerting again. If you don’t have a volume, here’s a naive suggestion: trying a redeploy in another region.
luckily my volume is on a separate app, however is there anyway to “kill” this one before i attempt to launch in another region? (as I’d like to have this app name for the app)
…try again now?
still timing out on delete
I wonder what would happen if you did a fly regions set ord
and redeployed.
%fly regions set ord
Error server returned a non-200 status code: 504
lol i think the ‘thing’ is just…unresponsive to anything
Oof. Sorry, that sucks. It does look like that app is properly deleted now.
so looks like changing the region to something else worked when launching a new app, so probably something is wrong with the yyz region? I’m not sure how everything works on the back end, but not sure if I was the only one in the yyz region (server?) that had the issue…
for future reference, does this happen from time to time where a region/server is rogue and I have to recreate the app in another region for it to work again?
But thanks for your helps!
Sorry that was such a bad experience!
The gremlins were out in force today! That “failed to activate new thin device” error message you had indicated a stuck dmsetup on your host, which needed a manual kick. After today, we should catch and fix those faster (with better Nomad alerts).
Orchestration (Nomad) should take care of making sure your requested number of instances are successfully deployed, so you shouldn’t have to do that. But Nomad got stuck on that yyz host today (related, if I understand correctly, to the dmsetup borkage), so it didn’t.
The suggestion of moving regions was borne of frustration after thinking we were free and clear a couple of times and being wrong. But it’s not clear to me whether moving the app did anything in the end, or whether at that point Nomad was sorted and you could have redeployed to yyz with the same result.
Anyway, thanks for your patience through all that. For what it’s worth, a couple of different gremlins did get flushed out and squashed by devs today.