We’re hitting a recurring issue running long non-TTY sprite exec commands and want to know if there’s a known cause or guidance. The same pattern was reported in Sprites becoming non-responsive after usage and Sprite cli timeouts since two days ago, both auto-closed.
Setup
sprite version v0.0.1-rc37- We invoke
sprite exec -s <sprite> -- bash -c "..."from a backend service. Runs are non-TTY and routinely 5–30+ minutes.
Symptom
A long sprite exec exits with:
Error: connection closed
(exit 1)
After that, every subsequent sprite exec, sprite console, and sprite api -s <name> /v1/machines against that one sprite hangs ~28s then times out or 404s:
Error: failed to start sprite command: failed to connect:
read tcp [<local v6>]:NNNN->[2a09:8280:1::99:d7b3:0]:443: i/o timeout
(also reproduces on IPv4 to 169.155.48.226:443)
Other sprites in the same org are reachable from the same client at the same time, and sprite list continues to work — so it’s scoped to one sprite, not org-wide and not a client-network issue. We’ve seen this on two different sprites within ~24 hours.
Workaround
sprite restore <checkpoint> -s <name> brings the sprite back immediately. We lose any in-flight work that wasn’t committed to git.
Questions
- Is this a known issue? It seems to occur after a long non-TTY exec has its connection dropped.
- Is there a graceful equivalent of “restart this sprite” that’s lighter than a checkpoint restore?
- Recommended pattern for reliable long-running workloads on a sprite — should we be using a persistent service inside the sprite (referenced by another community member) rather than per-invocation
sprite exec? - Any way to query from the CLI whether a sprite is in this stuck state, so we can detect it programmatically instead of waiting for the 28-second timeout?
Happy to share IDs / timestamps / --debug logs privately if useful.