sprite exec drops mid-run

We’re hitting a recurring issue running long non-TTY sprite exec commands and want to know if there’s a known cause or guidance. The same pattern was reported in Sprites becoming non-responsive after usage and Sprite cli timeouts since two days ago, both auto-closed.

Setup

  • sprite version v0.0.1-rc37
  • We invoke sprite exec -s <sprite> -- bash -c "..." from a backend service. Runs are non-TTY and routinely 5–30+ minutes.

Symptom

A long sprite exec exits with:

Error: connection closed

(exit 1)

After that, every subsequent sprite exec, sprite console, and sprite api -s <name> /v1/machines against that one sprite hangs ~28s then times out or 404s:

Error: failed to start sprite command: failed to connect:
  read tcp [<local v6>]:NNNN->[2a09:8280:1::99:d7b3:0]:443: i/o timeout

(also reproduces on IPv4 to 169.155.48.226:443)

Other sprites in the same org are reachable from the same client at the same time, and sprite list continues to work — so it’s scoped to one sprite, not org-wide and not a client-network issue. We’ve seen this on two different sprites within ~24 hours.

Workaround

sprite restore <checkpoint> -s <name> brings the sprite back immediately. We lose any in-flight work that wasn’t committed to git.

Questions

  1. Is this a known issue? It seems to occur after a long non-TTY exec has its connection dropped.
  2. Is there a graceful equivalent of “restart this sprite” that’s lighter than a checkpoint restore?
  3. Recommended pattern for reliable long-running workloads on a sprite — should we be using a persistent service inside the sprite (referenced by another community member) rather than per-invocation sprite exec?
  4. Any way to query from the CLI whether a sprite is in this stuck state, so we can detect it programmatically instead of waiting for the 28-second timeout?

Happy to share IDs / timestamps / --debug logs privately if useful.

bumping this so it doesn’t auto close

Hi there! Sprites stay awake when a process started via exec/console is writing stdout to its TTY. If there’s no TTY output, it’s going to go idle, which is what seems like is happening here.

If you’d like to run a long workload on a Sprite, your best bet is to use a service (like you mentioned) or make sure that whatever exec command you’re running outputs to TTY periodically (something like every ~5 minutes or less).