Skip to content

Context and model loop architecture

This page is the architecture analysis for the context-and-model-loop module. It complements the implementation pages in this chapter by focusing on how context is layered into a model request, how provider/auth selection is shaped, and how headless/SDK streaming is decomposed rather than re-listing prompt or template strings.

Scope: from a resolved runtime session and root-action options to a model-visible request, a streaming response, and the headless/SDK frame multiplex. Implementation specifics live in Prompt, context, and memory, Prompt assembly scenarios, Context, memory, compaction, checkpoints, and rewind, Prompt template catalog, Models, providers, and auth, Model selection, calls, usage, quota, and billing, and Headless streaming and resilience.

Module purpose

This module owns the request side of the agent loop: what the model sees, which provider serves the request, and how the streamed response is multiplexed into runtime state and (optionally) SDK/headless frames. It is intentionally separated from tool execution: this module decides what the model can perceive, while the tool/permission module decides what the model can do.

Architecture thesis

The context/model loop is a layered assembler plus a streaming multiplexer:

  • The assembler converts heterogeneous inputs (CLI flags, memory files, settings, plugins, MCP prompts/resources, tools, agents, session history) into a single model-visible request.
  • The multiplexer hides provider differences behind a shared streaming contract and exposes a uniform frame protocol for headless/SDK consumers.

This separation lets the runtime support interactive TUI and scripted/SDK transports with one context pipeline.

Source anchors

Semantic aliasSourceApproximate locationString or symbolArchitectural meaning
ManagedMemoryPolicycli.jsline ~185, byte 0x11e1cbCLAUDE.md-style instructions injected as organization-managed memoryManaged memory schema; org policy participates in context.
LocalRuleMemoryRootscli.jsline ~1231, byte 0x49b032.claude/rules, CLAUDE.local.mdRule and local memory file roots.
DynamicPromptBoundaryFlagcli.jsline ~19525, byte 0xdc0fb6--exclude-dynamic-system-prompt-sectionsSeparates stable prompt content from per-machine sections.
SystemPromptOverrideFlagcli.jsline ~19525, byte 0xdc0c89--system-prompt <prompt>Replaces the system prompt.
SystemPromptAppendFlagcli.jsline ~19525, byte 0xdc0d5f--append-system-prompt <prompt>Adds to the default system prompt.
OutputStyleContextSchemacli.jsline ~185, byte 0x11087doutputStylesPlugin/settings-contributed output style schema.
SlashCommandContextSurfacecli.jsline ~4965, byte 0x924930slashCommandsSlash commands counted as context.
TranscriptContextAssemblercli.jsline ~5579, byte 0x996ba7async function _O5({transcriptPath:H,scope:$="session",maxRawTranscriptBytes:q})Transcript-derived context assembler.
ProviderClassifiercli.jsline ~253, byte 0x1ed452CLAUDE_CODE_USE_BEDROCK, ..._VERTEX, ..._FOUNDRY, ..._MANTLE, ..._ANTHROPIC_AWSProvider classifier branches.
CredentialResolvercli.jsline ~43, byte 0x264c0ANTHROPIC_API_KEY, ANTHROPIC_AUTH_TOKENCredential resolution; key vs bearer differs in downstream headers.
ModelSelectionFlagcli.jsline ~19525, byte 0xdc18ed--model <model>Per-session model selection.
FallbackModelFlagcli.jsline ~19525, byte 0xdc1b54--fallback-model <model>Fallback model for print/headless mode.
PerTurnModelResolvercli.jsline ~253, byte 0x20ef88nG({permissionMode,mainLoopModel,exceeds200kTokens})Per-turn model resolver can alter model by mode/context.
ApiUsageAccountingcli.jsline ~2027, byte 0x52af3aapi_request, cost_usd, input_tokens, output_tokensProvider-call accounting and telemetry.
UnifiedRateLimitHeaderscli.jsline ~793, byte 0x4406aaanthropic-ratelimit-unified-*Unified rate-limit/quota headers parsed into runtime state.
HeadlessBudgetGuardcli.jsline ~19323, byte 0xda0191error_max_budget_usdHeadless budget guard result subtype.
HeadlessMcpCoordinatorcli.jsline ~19543, byte 0xdc73falet o4=fH9({regularMcpConfigs:WwHeadless branch creates the MCP coordinator before the model loop.
HeadlessRunnercli.jsline ~19324, byte 0xda31bbasync function T7AHeadless runner; validates print/SDK constraints.
HeadlessFrameMultiplexercli.jsline ~19349, byte 0xda50d0function H89Headless streaming/control multiplexer.
HeadlessOutboundChannelcli.jsline ~19349, byte 0xda50d0let h=H.outboundOutbound stream/channel abstraction inside H89.
RateLimitStreamFramecli.jsline ~2004, byte 0x516307rate_limit_eventRate-limit changes projected to SDK consumers.
PromptSuggestionFramecli.jsline ~2004, byte 0x519dafprompt_suggestionPredicted next-prompt frame emitted after a turn.
SessionStateChangedFramecli.jsline ~7240, byte 0xb2d4fcsession_state_changedIdle/running/requires_action state pushed alongside model frames.
TranscriptMirrorFramecli.jsline ~7240, byte 0xb2d257transcript_mirrorLocal transcript mirror frame in stream-JSON mode.
SdkFrameAdapterFiltercli.jsline ~9434, byte 0xc5950ccase "rate_limit_event": return N("[sdkMessageAdapter] Ignoring rate_limit_event message")SDK adapter explicitly handles a subset of frame types.
CompactionHookLifecyclecli.jsline ~185, byte 0x10b77cPreCompact, PostCompactCompaction lifecycle hooks around context shrinking.
AutoCompactionThresholdcli.jsline ~4963, byte 0x921035autoCompactEnabled, DISABLE_AUTO_COMPACT, autocompact: tokens=Auto-compaction gate and threshold path.

Internal decomposition

flowchart TD
Inputs[CLI flags + settings + memory + plugins + MCP + agents + session history] --> Sources[Context sources]
Sources --> Stable[Stable system-prompt sections]
Sources --> Dynamic[Dynamic per-machine sections]
Sources --> Tools[Tool/agent/MCP metadata]
Sources --> Transcript[Recent + subagent transcripts]
Stable --> Assembler[Prompt assembler]
Dynamic --> Assembler
Tools --> Assembler
Transcript --> Assembler
Assembler --> Compaction[Compaction policy and budget guard]
Compaction --> Request[Provider request]
Auth[Credential resolver] --> ProviderRouter[Provider classifier]
Models[Model flags / aliases / fallback] --> ProviderRouter
ProviderRouter --> Request
Request --> ModelStream[Streaming response]
ModelStream --> ToolUse[Tool-use deltas to permission boundary]
ModelStream --> TextUI[Assistant text to TUI / stream-json]
ModelStream --> Mux[Headless frame multiplexer]
Mux --> Result[final result frame]
Mux --> RateLimit[rate_limit_event]
Mux --> Suggestions[prompt_suggestion]
Mux --> State[session_state_changed]
Mux --> Mirror[transcript_mirror]
Mux --> Bridge[bridge_state]

The module composes three sub-components:

Sub-componentResponsibility
Context sourcesHeterogeneous inputs (memory, settings, plugins, MCP, agents, tools, session history, output styles, slash commands).
Prompt assembler + compactionProduces the model-visible request, applies --exclude-dynamic-system-prompt-sections, runs PreCompact/PostCompact hooks, and enforces budget/turn limits.
Provider/auth routerPicks credentials, classifies provider, sets model/fallback, prepares headers, and abstracts over Anthropic/Bedrock/Vertex/Foundry/Mantle/Anthropic AWS.

The HeadlessFrameMultiplexer wraps the model stream and adds non-model frames (rate limit, suggestions, state, transcript mirror, bridge state) without coupling them to provider details.

Public interface

Inputs

SourceEffect
--system-prompt[-file], --append-system-prompt[-file]Replace or extend the system prompt.
--exclude-dynamic-system-prompt-sectionsMove per-machine content (cwd, env, memory paths, git status) out of cache-sensitive sections.
--add-dir, --fileAdd tool-access directories and inject file resources into early context.
--model, --fallback-model, --effort, --thinking*, --max-turns, --max-budget-usd, --task-budget, --betasShape provider routing, thinking mode, budget guards, and beta headers.
CLAUDE.md, CLAUDE.local.md, .claude/rules, .claude/settings.json, managed settings, outputStylesMemory and presentation layers fed into the assembler.
ANTHROPIC_API_KEY, ANTHROPIC_AUTH_TOKEN, OAuth/file-descriptor token vars, provider CLAUDE_CODE_USE_* gatesCredential and provider classification.
Slash commands, skills, custom agents, MCP prompts/resources, toolsAdd capability metadata and prompt fragments.

Outputs

OutputConsumer
Provider requestStreaming model API (Anthropic, Bedrock, Vertex, Foundry, Mantle, Anthropic AWS).
assistant and tool_use deltasForwarded to TUI renderer or stream-JSON adapter.
Headless frames (result, rate_limit_event, prompt_suggestion, session_state_changed, transcript_mirror, bridge_state, task_notification, plugin_install)Headless/SDK consumers, transcript writers, remote bridge.
Compaction events (PreCompact/PostCompact hook calls)Hook subscribers, telemetry.
Context-budget warnings (e.g. large agent descriptions)UI and telemetry.

Internal collaborators

CollaboratorDirectionContract
Runtime lifecycleinboundProvides a fully composed runtime context (settings, auth, MCP, plugins, agents, session).
Sessions moduleinboundProvides transcript history, restored permission/model state, deferred tools.
Tool/permission moduleinbound + outboundSupplies tool metadata for context; receives tool-use deltas and ask/deny decisions back.
MCP/plugins/hooksinboundContribute prompts, resources, tool schemas, output styles, and lifecycle hooks.
Remote/bridge moduleoutboundReceives the same stream-JSON frames the SDK does; permission/control frames flow back in.
Telemetry/opsoutboundReceives tengu_* events for token usage, rate limits, retries, compaction, and budget exhaustion.

Design decisions

  1. Static vs dynamic system-prompt boundary. The __SYSTEM_PROMPT_DYNAMIC_BOUNDARY__ sentinel and --exclude-dynamic-system-prompt-sections flag exist so that per-machine fragments do not invalidate prompt caches. Treating cacheability as a first-class concern is a deliberate context-engineering choice.
  2. Layered context, not single template. Memory files, settings, slash commands, skills, agents, MCP, tools, and session history are independent contributors; the assembler merges them per turn instead of relying on one monolithic template.
  3. Provider classifier in one place. Instead of having each call site detect the provider, environment gates (CLAUDE_CODE_USE_*) are checked once and downstream code consumes a single classifier result.
  4. Credential resolution by precedence, not branching. API key, OAuth token, helper script, and file-descriptor sources are tried in a fixed order so the rest of the loop can treat credentials as opaque.
  5. Headless mode is a different projection, not a different agent. HeadlessRunner and HeadlessFrameMultiplexer reuse the same context assembly and provider router as the TUI; only the projection (stream-JSON frames vs UI updates) differs.
  6. Frame multiplexing keeps non-model state observable. Rate limit events, prompt suggestions, session-state changes, transcript mirrors, and bridge state are first-class outbound frames so SDK/remote consumers do not have to infer them.
  7. Compaction is a runtime concern, not a settings flag. PreCompact/PostCompact hooks let external code participate in context shrinking; budget/turn limits are enforced inside the loop rather than at the model boundary.
  8. Fallback model only in non-interactive paths. The --fallback-model documentation strings restrict fallback to print mode, which keeps interactive sessions predictable.

Failure modes

FailureBehavior
Invalid format combination (e.g. --input-format=stream-json without --output-format=stream-json)HeadlessRunner rejects with a precise error before any provider call.
Provider auth missing or expiredCredential resolver returns null; the loop reports a structured error frame and (in headless) exits with an error_during_execution subtype.
Rate limit hitrate_limit_event frame emitted; provider state is preserved and the loop can re-issue or wait based on policy.
Turn or budget exhaustedresult frame uses error_max_turns, error_max_budget_usd, or error_max_structured_output_retries so callers can distinguish stop conditions.
Context too large after assemblyCompaction is triggered through PreCompact/PostCompact hooks before the request is built; large agent descriptions also raise pre-flight warnings.
Stream interruptionThe headless loop drains any in-flight tool calls; the SDK adapter explicitly ignores frame types it does not understand instead of crashing.

Extension points

ExtensionHow it plugs in
Add a context sourceContribute through settings/plugins/MCP rather than touching the assembler directly.
Add a providerAdd a CLAUDE_CODE_USE_* branch to the classifier and a credential adapter; existing model flags remain stable.
Add a new outbound frame typeDefine the schema near the existing y.object(...) schemas (~line 2004) and emit it from HeadlessFrameMultiplexer; SDK adapters must opt into handling it.
Customize compactionSubscribe to PreCompact/PostCompact hooks; do not mutate prompt assembly directly.
Override prompt cacheabilityUse the dynamic-section flag rather than rewriting the prompt; this preserves provider-side caching.

Caveats

  • Detailed prompt fragments and templates are cataloged in Prompt template catalog; major runtime assembly shapes are reconstructed in Prompt assembly scenarios. They are runtime evidence, not authoritative prose.
  • Provider adapter internals (request shaping, header mapping) are not fully recoverable from the bundle; this page documents the observable seams.
  • The bundled Anthropic SDK contributes many strings (session_id, /v1/sessions/...) that are SDK documentation/templates, not Claude Code lifecycle. They are only treated as runtime evidence when they connect to flags or loops.

Created and maintained by Yingting Huang.