Summary
When attaching to an existing Sprites terminal session via the exec WebSocket API, CJK (Chinese/Japanese/Korean) characters in the scrollback replay data are each followed by a U+FFFD (Replacement Character), causing garbled display in xterm.js. Live (real-time) output renders correctly.
Steps to Reproduce
- Create a Sprites exec session with
tty: true - Output text containing CJK characters (e.g., run a command that prints Japanese text) until the scrollback buffer is populated
- Attach to the same session from a new WebSocket connection (
/v1/sprites/{name}/exec/{sessionId}) - Scroll up to view the replayed scrollback history
- CJK characters each have a U+FFFD appended
Root Cause
The screen buffer appears to store a U+FFFD placeholder in the second cell of each wide (fullwidth) character. When the buffer is serialized and sent as replay data, these placeholders are included in the byte stream.
Terminal emulators like xterm.js manage wide character column widths internally, so the second-cell placeholder is unnecessary and renders as a visible replacement character.
Hex dump evidence
# Each CJK char (3-byte UTF-8) is followed by ef bf bd (U+FFFD)
e38193 efbfbd e38293 efbfbd e381ab efbfbd e381a1 efbfbd
こ FFFD ん FFFD に FFFD ち FFFD
# Same pattern with ANSI color escapes:
\e[38;5;231m 適 \e[0m FFFD \e[38;5;231;48;5;237m 当 \e[0m FFFD
Observed message sequence
| # | Type | Content |
|---|---|---|
| 1 | Text | session_info (tty: true) |
| 2 | Binary (71 KB) | Replay/scrollback data — contains U+FFFD after every wide char |
| 3+ | Binary | Live PTY data — no U+FFFD, renders correctly |
Current Workaround
We strip all U+FFFD bytes (ef bf bd) from binary frames in our WebSocket proxy before forwarding to the client:
var utf8ReplacementChar = []byte{0xef, 0xbf, 0xbd}
// in the sprites→client forwarding loop:
if msgType == websocket.BinaryMessage {
data = bytes.ReplaceAll(data, utf8ReplacementChar, nil)
}
This is safe in practice since U+FFFD rarely appears in legitimate terminal output, and becomes a no-op once the underlying issue is fixed.
Expected Behavior
Replay data should not contain U+FFFD wide-character placeholders. The serialized screen buffer should emit only the actual character for each wide character cell, omitting the second-column placeholder.
Environment
- Sprites exec WebSocket API (
wss://api.sprites.dev/v1/sprites/{name}/exec/{sessionId}) - Client: xterm.js v5 with Unicode11 addon
- Locale: ja_JP.UTF-8