Adds visibility into how much of the model context window the chat agent
is using and a way to free space when it fills up.
Backend
- New ContextUsage{used_chars, budget_chars} returned from chat_send
alongside messages (return type ChatTurnResult). Computed by running
build_history once at end of turn and counting char bytes — same data
path as the actual LLM call, so the count is exact for the chosen
budget unit.
- CONTEXT_BUDGET_CHARS = 24,000 (~6-8K tokens). Tuned for Ollama
defaults; can be exposed via AiSettings later.
- New chat_compact Tauri command. Splits the thread at the last user
turn, LLM-summarises everything before it (3-6 bullet points,
language-aware, < 800 chars), and returns a thread of
[Assistant("📋 Compacted N messages: …"), <last_user_turn?>]. The
recent user turn is preserved untouched so the agent can keep
answering it.
- render_thread_for_summary skips QueryResult.rows entirely so a single
large run_query can't blow the summariser's context.
- 3 new unit tests (last_user_turn_index, render skipping rows, empty
thread no-op).
Frontend
- ChatPanel header gets a usage badge: progress bar + `Xk / Yk tok ·
P%`, color-coded green (<30%) / muted (<60%) / amber (<85%) / red
(≥85%). Tooltip explains and nudges /compact when ≥60%.
- Compact button next to Clear in the header.
- Slash commands in ChatComposer: /compact, /clear.
- Empty-state shows the slash-command hint.
- Auto-compact: if the previous turn pushed usage past 85% AND the
thread has more than one message, the next user turn first runs
chat_compact transparently before chat_send. The compaction surfaces
as a visible Assistant("📋 Compacted …") message so the user can see
what the agent kept.
- app-store gets chatUsage map per tab + replaceChatThread + setChatUsage
actions; closeTab and clearChatThread clean up usage too.
Verification: cargo check clean, cargo test --lib 53 pass (+3),
tsc --noEmit clean, vitest run 20 pass.