Skip to content

Operations and native-support architecture

This page is the architecture analysis for the hosted-agent-ops module. It complements the implementation pages by focusing on how diagnostics, telemetry, updates, and native helpers sit around the main runtime without entering its inner loop rather than re-listing each event name.

Scope: debug logs, telemetry/traffic gates, feature flags, doctor/update tooling, crash and error reporting, hosted review signals, and embedded image/audio native helpers. Implementation specifics live in Diagnostics and debug logs, Telemetry and tracing, Feature gates reference, Updater and doctor, and Media native modules.

Module purpose

This module owns everything that observes or maintains the runtime without owning a model turn. It exists so the inner loop (lifecycle → context → tools → sessions) can stay focused while diagnostics, telemetry, updates, and binary support live behind narrow seams.

It deliberately does not own:

  • Tool execution (owned by the tools/security module).
  • Session persistence (owned by the sessions module).
  • Permission decisions (owned by the trust pipeline).

It only emits signals about them.

Architecture thesis

Ops is a passive periphery: it surfaces structured events, exposes user-facing maintenance commands, and provides binary helpers. Every ops surface is gated by a managed setting, an environment variable, or a CLI flag, so the runtime can be operated with strict observability or with no telemetry at all.

Source anchors

Semantic aliasSourceApproximate locationString or symbolArchitectural meaning
NativeUpdaterStartEventcli.jsline ~9355, byte 0xc257d7tengu_native_auto_updater_startUpdater entry; runs out-of-band from the model loop.
NativeUpdaterLockEventcli.jsline ~9355, byte 0xc25b5ctengu_native_auto_updater_lock_contentionUpdater lock telemetry; multiple invocations are coordinated.
NativeUpdaterFailureEventcli.jsline ~9355, byte 0xc25d87tengu_native_auto_updater_failUpdater failure classification (timeout/checksum/not_found).
NativeUpdaterSuccessEventcli.jsline ~9355, byte 0xc25e2ctengu_native_auto_updater_successUpdater success telemetry.
AutoUpdateReleaseChannelcli.jsline ~185, byte 0x11d275auto-update (release channel: latest, stable, rc)Settings-driven release channel selection.
UpdaterPermissionPreflightcli.jsline ~2741, byte 0x73879eInsufficient permissions for auto-updatesDoctor-style preflight for updater.
AutoUpdaterStatusMachinecli.jsline ~433, byte 0x302b03autoUpdaterStatus: migrated, installed, disabled, enabledState machine for updater install kind.
DoctorDiagnosticsScreencli.jsline ~605, byte 0x38fcfe/doctor diagnostics screenDoctor UX entry.
ShutdownCoordinatorcli.jsline ~1676, byte 0x4fb763recordUncaughtAndCheckBreaker, gracefulShutdown, flushAnalyticsSinksOps/shutdown coordinator surface.
EventLoopStallDetectorcli.jsline ~19294, byte 0xd9426astartEventLoopStallDetectorOptional diagnostic added at the bootstrap layer.
StartupProfilingMarkscli.jsline ~64, byte 0x33865import_time, cli_entry, main_tsx_imports_loaded, cli_before_main_importStartup profiling event groups.
ProviderAndErrorGatescli.jsline ~133, byte 0xea615CLAUDE_CODE_USE_VERTEX, CLAUDE_CODE_USE_FOUNDRY, CLAUDE_CODE_USE_ANTHROPIC_AWS, CLAUDE_CODE_USE_MANTLE, DISABLE_ERROR_REPORTINGProvider/error-reporting gates evaluated up front.
TranscriptMirrorFramecli.jsline ~9434, byte 0xc59a40transcript_mirrorOps-friendly local mirror of remote transcripts.
SubagentStatusLineSchemacli.jsline ~185, byte 0x11b5b2subagentStatusLineStatus-line schema for subagent UX.
OpsNotificationStateFlagscli.jsline ~11, byte 0x7c3bmarkFirstTeleportMessageLogged, isSessionPersistenceDisabled, isUserActiveForNotificationsCross-cutting state flags observed by ops/notifications.
MediaNativeJsShimsclaude-code-pkg/image-processor.js, claude-code-pkg/audio-capture.jsline ~11require("/$bunfs/root/*.node")JS shims for embedded native helpers; native .node files are no longer retained in the final artifact layout.

Internal decomposition

flowchart TD
Runtime[Claude Code runtime] --> Debug[Debug log writers]
Runtime --> Telemetry[Telemetry sinks]
Runtime --> Errors[Error reporter / breaker]
Runtime --> Doctor[/doctor diagnostics]
Runtime --> Updater[Native auto updater]
Runtime --> Hosted[Hosted review signals]
Runtime --> Statusline[Status line / subagent status line]
Runtime --> Native[Image / audio native helpers]
Debug --> Logs[~/.claude/debug-logs]
Telemetry --> Sink[analytics sinks]
Errors --> Sink
Updater --> Channel[release channel: latest / stable / rc]
Hosted --> Preflight[/v1/ultrareview/preflight]
Native --> Attach[image / audio attachment inputs]
Policy[managed settings + env gates] --> Telemetry
Policy --> Errors
Policy --> Hosted
Policy --> Updater
Sub-componentResponsibility
Debug log writersAppend-only logs for support; respect debug log gating.
Telemetry sinksEmit tengu_* events; flushed by the shutdown coordinator.
Error reporterRecord uncaught/breaker state; gated by DISABLE_ERROR_REPORTING.
DoctorUser-facing diagnostics surface for environment, permissions, model selection, integrations.
Native auto updaterOut-of-process update with a release-channel and install-kind state machine.
Hosted review signalsultrareview-adjacent preflight and result hooks; gated by hosted settings/policy.
Status line / subagent status lineOptional command-derived status line rendered around the loop.
Native helpersOriginal payload includes image-processor.node and audio-capture.node loaded via JS shims; final artifacts retain only the shims.

Public interface

Inputs

SurfaceEffect
--debug, --debug-to-stderr, --ai-debugEnable debug logging variants.
--add-trace-attribute key=valueAttach OTEL attributes to runtime traces.
DISABLE_TELEMETRY, DISABLE_ERROR_REPORTING, DISABLE_AUTOUPDATERCoarse env gates for telemetry, errors, updater.
OTEL_* envWire OTEL sinks if configured.
claude doctorRun the diagnostics path.
claude update, claude installManual updater entry.
Settings: cleanupPeriodDays, statusLine, subagentStatusLine, auto-update channel/min versionPersistent ops configuration.
Managed policy: disableAllHooks, disableRemoteControl, disableAgentView, disableSkillShellExecutionCapability/policy switches surfaced through ops UX.

Outputs

OutputConsumer
tengu_* event streamTelemetry sink (when enabled).
Debug log filesSupport tooling.
Doctor renderTerminal UX.
Updater state transitionsSettings/state file + telemetry.
Crash/error reportsError sink (when enabled).
Status-line stringsTerminal UX.
Image/audio buffersAttachment paths in the context/model loop.

Internal collaborators

CollaboratorContract
Runtime lifecycleCalls into ops in TopLevelMain (event-loop stall detector, profiling marks) and in preAction (sinks/logs/managed settings refresh).
Settings/policyProvides the gates ops checks before emitting or persisting.
SessionsReceives session_state_changed, transcript-mirror, and bridge-state frames that ops surfaces or logs.
Tools/securityEmits tool decision telemetry; ops aggregates and persists.
Updater backendExternal binary fetch + checksum verification; result is recorded in settings/state.
Hosted review backend/v1/ultrareview/preflight and related routes.
Native helpersBun resolves require("/$bunfs/root/...node") for image/audio addons.

Design decisions

  1. Ops is observation, not control. The shutdown coordinator can flush analytics and disarm orphan handlers, but it does not interrupt model turns. Hard control still belongs to the lifecycle module.
  2. All telemetry is gated. Managed settings, env vars, and CLI flags all participate; this is intentional so deployments can be strictly observable or strictly silent.
  3. The updater runs out-of-band. Lock-contention telemetry shows the updater is designed for multi-invocation safety; it never blocks a running model turn.
  4. Doctor is the canonical diagnostics surface. Other diagnostic frames (status line, debug logs) are complementary; doctor is the place to converge for support.
  5. Native helpers are isolated. They are loaded by tiny JS shims (image-processor.js, audio-capture.js) and never participate in the trust pipeline or session state directly; they only produce buffers consumed by the context plane.
  6. Error reporting is opt-in / opt-out at a coarse grain. DISABLE_ERROR_REPORTING short-circuits the reporter rather than reshaping individual call sites.
  7. Profiling marks are part of the lifecycle, not a separate framework. import_time, cli_entry, main_function_start, run_function_start, preAction_* marks all flow through the same logger so support can read a single timeline.

Operational seams

SeamWhat flows acrossWhy it exists
Early process-exit hookFinal flush / cleanupCatch-all to give sinks one last chance to drain.
process.on("SIGINT", ...)Headless vs interactive shutdown branchingDifferent ops UX for scripted vs human runs.
gracefulShutdown / gracefulShutdownSyncCoordinated flush + analyticsKeep telemetry consistent on planned termination.
recordUncaughtAndCheckBreakerUncaught exception pathCentralize crash classification, avoid noisy duplicates.
tengu_native_auto_updater_* eventsUpdater telemetryExternalize updater health without coupling to UI.
auto-update settings + stateChannel + install kindPersistent state for the updater state machine.

Failure modes

FailureBehavior
Updater times out, checksum fails, or binary not foundClassified failure event emitted; existing install remains usable.
Updater lock already held by another processlock_contention event; no second updater runs.
Auto-update permission insufficientDoctor preflight reports the fix-up message instead of silently failing.
Telemetry sink unreachableEvents buffer in-memory; shutdown flush still attempts delivery.
Native helper missing or fails to loadAttachment paths degrade; non-media flows are unaffected.
Status-line command errorsStatus line is suppressed; loop continues.
Event-loop stall detector triggersDiagnostic events emitted; runtime continues.
Hosted review preflight rejectsUX surfaces the result; local workflow is not blocked.

Extension points

ExtensionHow it plugs in
Additional telemetry sinkRegister through the analytics-sink interface and rely on flushAnalyticsSinks.
Additional debug log channelUse existing logger; do not invent a new file format.
New diagnostic checkAdd to the doctor command rather than scattering checks across the runtime.
Status line customizationUse the statusLine / subagentStatusLine settings; treat as commands, not inline code.
Custom updater channelAdd to the auto-update settings enum; updater logic should not branch on out-of-band sources.
Native attachment typeAdd a JS shim and a .node addon; attach via the existing attachment surface in the context module.

Caveats

  • The .node modules are stripped Linux x86-64 ELF shared objects; their internal symbols were not reverse-engineered here. They are part of the shipped payload but their concrete behavior is treated as a research opportunity.
  • Many tengu_* strings are runtime evidence; the precise sink schema is implementation-defined.
  • This module touches many other modules but owns no model-turn behavior; if a question is about what the model could do or see, it belongs to the context, tools, or sessions modules instead.

Created and maintained by Yingting Huang.