Context and model loop architecture
This page is the architecture analysis for the context-and-model-loop module. It complements the implementation pages in this chapter by focusing on how context is layered into a model request, how provider/auth selection is shaped, and how headless/SDK streaming is decomposed rather than re-listing prompt or template strings.
Scope: from a resolved runtime session and root-action options to a model-visible request, a streaming response, and the headless/SDK frame multiplex. Implementation specifics live in Prompt, context, and memory, Prompt assembly scenarios, Context, memory, compaction, checkpoints, and rewind, Prompt template catalog, Models, providers, and auth, Model selection, calls, usage, quota, and billing, and Headless streaming and resilience.
Module purpose
This module owns the request side of the agent loop: what the model sees, which provider serves the request, and how the streamed response is multiplexed into runtime state and (optionally) SDK/headless frames. It is intentionally separated from tool execution: this module decides what the model can perceive, while the tool/permission module decides what the model can do.
Architecture thesis
The context/model loop is a layered assembler plus a streaming multiplexer:
- The assembler converts heterogeneous inputs (CLI flags, memory files, settings, plugins, MCP prompts/resources, tools, agents, session history) into a single model-visible request.
- The multiplexer hides provider differences behind a shared streaming contract and exposes a uniform frame protocol for headless/SDK consumers.
This separation lets the runtime support interactive TUI and scripted/SDK transports with one context pipeline.
Source anchors
| Semantic alias | Source | Approximate location | String or symbol | Architectural meaning |
|---|---|---|---|---|
| ManagedMemoryPolicy | cli.js | line ~185, byte 0x11e1cb | CLAUDE.md-style instructions injected as organization-managed memory | Managed memory schema; org policy participates in context. |
| LocalRuleMemoryRoots | cli.js | line ~1231, byte 0x49b032 | .claude/rules, CLAUDE.local.md | Rule and local memory file roots. |
| DynamicPromptBoundaryFlag | cli.js | line ~19525, byte 0xdc0fb6 | --exclude-dynamic-system-prompt-sections | Separates stable prompt content from per-machine sections. |
| SystemPromptOverrideFlag | cli.js | line ~19525, byte 0xdc0c89 | --system-prompt <prompt> | Replaces the system prompt. |
| SystemPromptAppendFlag | cli.js | line ~19525, byte 0xdc0d5f | --append-system-prompt <prompt> | Adds to the default system prompt. |
| OutputStyleContextSchema | cli.js | line ~185, byte 0x11087d | outputStyles | Plugin/settings-contributed output style schema. |
| SlashCommandContextSurface | cli.js | line ~4965, byte 0x924930 | slashCommands | Slash commands counted as context. |
| TranscriptContextAssembler | cli.js | line ~5579, byte 0x996ba7 | async function _O5({transcriptPath:H,scope:$="session",maxRawTranscriptBytes:q}) | Transcript-derived context assembler. |
| ProviderClassifier | cli.js | line ~253, byte 0x1ed452 | CLAUDE_CODE_USE_BEDROCK, ..._VERTEX, ..._FOUNDRY, ..._MANTLE, ..._ANTHROPIC_AWS | Provider classifier branches. |
| CredentialResolver | cli.js | line ~43, byte 0x264c0 | ANTHROPIC_API_KEY, ANTHROPIC_AUTH_TOKEN | Credential resolution; key vs bearer differs in downstream headers. |
| ModelSelectionFlag | cli.js | line ~19525, byte 0xdc18ed | --model <model> | Per-session model selection. |
| FallbackModelFlag | cli.js | line ~19525, byte 0xdc1b54 | --fallback-model <model> | Fallback model for print/headless mode. |
| PerTurnModelResolver | cli.js | line ~253, byte 0x20ef88 | nG({permissionMode,mainLoopModel,exceeds200kTokens}) | Per-turn model resolver can alter model by mode/context. |
| ApiUsageAccounting | cli.js | line ~2027, byte 0x52af3a | api_request, cost_usd, input_tokens, output_tokens | Provider-call accounting and telemetry. |
| UnifiedRateLimitHeaders | cli.js | line ~793, byte 0x4406aa | anthropic-ratelimit-unified-* | Unified rate-limit/quota headers parsed into runtime state. |
| HeadlessBudgetGuard | cli.js | line ~19323, byte 0xda0191 | error_max_budget_usd | Headless budget guard result subtype. |
| HeadlessMcpCoordinator | cli.js | line ~19543, byte 0xdc73fa | let o4=fH9({regularMcpConfigs:Ww | Headless branch creates the MCP coordinator before the model loop. |
| HeadlessRunner | cli.js | line ~19324, byte 0xda31bb | async function T7A | Headless runner; validates print/SDK constraints. |
| HeadlessFrameMultiplexer | cli.js | line ~19349, byte 0xda50d0 | function H89 | Headless streaming/control multiplexer. |
| HeadlessOutboundChannel | cli.js | line ~19349, byte 0xda50d0 | let h=H.outbound | Outbound stream/channel abstraction inside H89. |
| RateLimitStreamFrame | cli.js | line ~2004, byte 0x516307 | rate_limit_event | Rate-limit changes projected to SDK consumers. |
| PromptSuggestionFrame | cli.js | line ~2004, byte 0x519daf | prompt_suggestion | Predicted next-prompt frame emitted after a turn. |
| SessionStateChangedFrame | cli.js | line ~7240, byte 0xb2d4fc | session_state_changed | Idle/running/requires_action state pushed alongside model frames. |
| TranscriptMirrorFrame | cli.js | line ~7240, byte 0xb2d257 | transcript_mirror | Local transcript mirror frame in stream-JSON mode. |
| SdkFrameAdapterFilter | cli.js | line ~9434, byte 0xc5950c | case "rate_limit_event": return N("[sdkMessageAdapter] Ignoring rate_limit_event message") | SDK adapter explicitly handles a subset of frame types. |
| CompactionHookLifecycle | cli.js | line ~185, byte 0x10b77c | PreCompact, PostCompact | Compaction lifecycle hooks around context shrinking. |
| AutoCompactionThreshold | cli.js | line ~4963, byte 0x921035 | autoCompactEnabled, DISABLE_AUTO_COMPACT, autocompact: tokens= | Auto-compaction gate and threshold path. |
Internal decomposition
flowchart TD Inputs[CLI flags + settings + memory + plugins + MCP + agents + session history] --> Sources[Context sources] Sources --> Stable[Stable system-prompt sections] Sources --> Dynamic[Dynamic per-machine sections] Sources --> Tools[Tool/agent/MCP metadata] Sources --> Transcript[Recent + subagent transcripts]
Stable --> Assembler[Prompt assembler] Dynamic --> Assembler Tools --> Assembler Transcript --> Assembler
Assembler --> Compaction[Compaction policy and budget guard] Compaction --> Request[Provider request]
Auth[Credential resolver] --> ProviderRouter[Provider classifier] Models[Model flags / aliases / fallback] --> ProviderRouter ProviderRouter --> Request
Request --> ModelStream[Streaming response] ModelStream --> ToolUse[Tool-use deltas to permission boundary] ModelStream --> TextUI[Assistant text to TUI / stream-json]
ModelStream --> Mux[Headless frame multiplexer] Mux --> Result[final result frame] Mux --> RateLimit[rate_limit_event] Mux --> Suggestions[prompt_suggestion] Mux --> State[session_state_changed] Mux --> Mirror[transcript_mirror] Mux --> Bridge[bridge_state]The module composes three sub-components:
| Sub-component | Responsibility |
|---|---|
| Context sources | Heterogeneous inputs (memory, settings, plugins, MCP, agents, tools, session history, output styles, slash commands). |
| Prompt assembler + compaction | Produces the model-visible request, applies --exclude-dynamic-system-prompt-sections, runs PreCompact/PostCompact hooks, and enforces budget/turn limits. |
| Provider/auth router | Picks credentials, classifies provider, sets model/fallback, prepares headers, and abstracts over Anthropic/Bedrock/Vertex/Foundry/Mantle/Anthropic AWS. |
The HeadlessFrameMultiplexer wraps the model stream and adds non-model frames (rate limit, suggestions, state, transcript mirror, bridge state) without coupling them to provider details.
Public interface
Inputs
| Source | Effect |
|---|---|
--system-prompt[-file], --append-system-prompt[-file] | Replace or extend the system prompt. |
--exclude-dynamic-system-prompt-sections | Move per-machine content (cwd, env, memory paths, git status) out of cache-sensitive sections. |
--add-dir, --file | Add tool-access directories and inject file resources into early context. |
--model, --fallback-model, --effort, --thinking*, --max-turns, --max-budget-usd, --task-budget, --betas | Shape provider routing, thinking mode, budget guards, and beta headers. |
CLAUDE.md, CLAUDE.local.md, .claude/rules, .claude/settings.json, managed settings, outputStyles | Memory and presentation layers fed into the assembler. |
ANTHROPIC_API_KEY, ANTHROPIC_AUTH_TOKEN, OAuth/file-descriptor token vars, provider CLAUDE_CODE_USE_* gates | Credential and provider classification. |
| Slash commands, skills, custom agents, MCP prompts/resources, tools | Add capability metadata and prompt fragments. |
Outputs
| Output | Consumer |
|---|---|
| Provider request | Streaming model API (Anthropic, Bedrock, Vertex, Foundry, Mantle, Anthropic AWS). |
assistant and tool_use deltas | Forwarded to TUI renderer or stream-JSON adapter. |
Headless frames (result, rate_limit_event, prompt_suggestion, session_state_changed, transcript_mirror, bridge_state, task_notification, plugin_install) | Headless/SDK consumers, transcript writers, remote bridge. |
Compaction events (PreCompact/PostCompact hook calls) | Hook subscribers, telemetry. |
| Context-budget warnings (e.g. large agent descriptions) | UI and telemetry. |
Internal collaborators
| Collaborator | Direction | Contract |
|---|---|---|
| Runtime lifecycle | inbound | Provides a fully composed runtime context (settings, auth, MCP, plugins, agents, session). |
| Sessions module | inbound | Provides transcript history, restored permission/model state, deferred tools. |
| Tool/permission module | inbound + outbound | Supplies tool metadata for context; receives tool-use deltas and ask/deny decisions back. |
| MCP/plugins/hooks | inbound | Contribute prompts, resources, tool schemas, output styles, and lifecycle hooks. |
| Remote/bridge module | outbound | Receives the same stream-JSON frames the SDK does; permission/control frames flow back in. |
| Telemetry/ops | outbound | Receives tengu_* events for token usage, rate limits, retries, compaction, and budget exhaustion. |
Design decisions
- Static vs dynamic system-prompt boundary. The
__SYSTEM_PROMPT_DYNAMIC_BOUNDARY__sentinel and--exclude-dynamic-system-prompt-sectionsflag exist so that per-machine fragments do not invalidate prompt caches. Treating cacheability as a first-class concern is a deliberate context-engineering choice. - Layered context, not single template. Memory files, settings, slash commands, skills, agents, MCP, tools, and session history are independent contributors; the assembler merges them per turn instead of relying on one monolithic template.
- Provider classifier in one place. Instead of having each call site detect the provider, environment gates (
CLAUDE_CODE_USE_*) are checked once and downstream code consumes a single classifier result. - Credential resolution by precedence, not branching. API key, OAuth token, helper script, and file-descriptor sources are tried in a fixed order so the rest of the loop can treat credentials as opaque.
- Headless mode is a different projection, not a different agent.
HeadlessRunnerandHeadlessFrameMultiplexerreuse the same context assembly and provider router as the TUI; only the projection (stream-JSON frames vs UI updates) differs. - Frame multiplexing keeps non-model state observable. Rate limit events, prompt suggestions, session-state changes, transcript mirrors, and bridge state are first-class outbound frames so SDK/remote consumers do not have to infer them.
- Compaction is a runtime concern, not a settings flag.
PreCompact/PostCompacthooks let external code participate in context shrinking; budget/turn limits are enforced inside the loop rather than at the model boundary. - Fallback model only in non-interactive paths. The
--fallback-modeldocumentation strings restrict fallback to print mode, which keeps interactive sessions predictable.
Failure modes
| Failure | Behavior |
|---|---|
Invalid format combination (e.g. --input-format=stream-json without --output-format=stream-json) | HeadlessRunner rejects with a precise error before any provider call. |
| Provider auth missing or expired | Credential resolver returns null; the loop reports a structured error frame and (in headless) exits with an error_during_execution subtype. |
| Rate limit hit | rate_limit_event frame emitted; provider state is preserved and the loop can re-issue or wait based on policy. |
| Turn or budget exhausted | result frame uses error_max_turns, error_max_budget_usd, or error_max_structured_output_retries so callers can distinguish stop conditions. |
| Context too large after assembly | Compaction is triggered through PreCompact/PostCompact hooks before the request is built; large agent descriptions also raise pre-flight warnings. |
| Stream interruption | The headless loop drains any in-flight tool calls; the SDK adapter explicitly ignores frame types it does not understand instead of crashing. |
Extension points
| Extension | How it plugs in |
|---|---|
| Add a context source | Contribute through settings/plugins/MCP rather than touching the assembler directly. |
| Add a provider | Add a CLAUDE_CODE_USE_* branch to the classifier and a credential adapter; existing model flags remain stable. |
| Add a new outbound frame type | Define the schema near the existing y.object(...) schemas (~line 2004) and emit it from HeadlessFrameMultiplexer; SDK adapters must opt into handling it. |
| Customize compaction | Subscribe to PreCompact/PostCompact hooks; do not mutate prompt assembly directly. |
| Override prompt cacheability | Use the dynamic-section flag rather than rewriting the prompt; this preserves provider-side caching. |
Caveats
- Detailed prompt fragments and templates are cataloged in Prompt template catalog; major runtime assembly shapes are reconstructed in Prompt assembly scenarios. They are runtime evidence, not authoritative prose.
- Provider adapter internals (request shaping, header mapping) are not fully recoverable from the bundle; this page documents the observable seams.
- The bundled Anthropic SDK contributes many strings (
session_id,/v1/sessions/...) that are SDK documentation/templates, not Claude Code lifecycle. They are only treated as runtime evidence when they connect to flags or loops.
Related docs
- Prompt, context, and memory
- Prompt assembly scenarios
- Context, memory, compaction, checkpoints, and rewind
- Prompt template catalog
- Models, providers, and auth
- Model selection, calls, usage, quota, and billing
- Headless streaming and resilience
- System architecture
- Tool runtime and security architecture
Created and maintained by Yingting Huang.