A source code analysis of the three-layer defense system Claude Code uses to prevent model distillation — fake tool injection, connector text summarization, and streamlined output transforms.
Knowledge distillation is a technique where a smaller "student" model learns to replicate the behavior of a larger "teacher" model by training on the teacher's outputs. In the context of commercial LLMs, this means a competitor could systematically record API interactions — inputs and outputs — to train a cheaper model that approximates the original.
Claude Code contains a three-layer anti-distillation system designed to make such recording useless. By poisoning tool definitions, summarizing intermediate reasoning, and obfuscating SDK output, the system attacks distillation at every stage of the pipeline: request, response, and client output.
All three mechanisms are disabled in the public open-source build. The feature() function in the published code (src/_stubs/bun-bundle.ts) always returns false. These defenses only activate in Anthropic's internal builds where the real bun:bundle feature flags resolve to true.
Each layer operates at a different stage of the API lifecycle, creating defense in depth against distillation attempts.
Server injects fake tool definitions into API requests. Distilled models learn nonexistent tools, poisoning training data.
Server summarizes intermediate reasoning between tool calls and signs it cryptographically. Distillers see only summaries.
SDK output strips tool details to vague summaries. Observers cannot recover exact tool calls or parameters.
The first layer operates on the API request side. When enabled, Claude Code sends anti_distillation: ['fake_tools'] in the request body. This instructs the Anthropic API server to inject fabricated tool definitions alongside the real ones.
If an attacker records these API interactions for distillation training, their student model will learn to call tools that don't actually exist. The model becomes "poisoned" — it generates tool calls for phantom tools, producing broken outputs that are impossible to debug without understanding the defense.
// Anti-distillation: send fake_tools opt-in for 1P CLI only if ( feature('ANTI_DISTILLATION_CC') ? process.env.CLAUDE_CODE_ENTRYPOINT === 'cli' && shouldIncludeFirstPartyOnlyBetas() && getFeatureValue_CACHED_MAY_BE_STALE( 'tengu_anti_distill_fake_tool_injection', false, ) : false ) { result.anti_distillation = ['fake_tools'] }
The activation requires all four conditions to be true simultaneously:
The actual fake tool generation happens on Anthropic's API servers, not in Claude Code itself. The client merely sends the opt-in signal. This means the fake tool definitions are dynamically generated and never appear in the open-source code — making them impossible to filter out by static analysis.
The second layer targets the API response. When Claude Code makes agentic tool calls, the model generates intermediate text between tool invocations — reasoning about what to do next, analyzing tool results, etc. This "connector text" is the most valuable training signal for distillation.
With this defense enabled, the Anthropic API server buffers this connector text, replaces it with a summary, and attaches a cryptographic signature. The signature allows the server to restore the original text on subsequent turns, maintaining Claude's reasoning context while hiding it from external observers.
// POC: server-side connector-text summarization (anti-distillation). The // API buffers assistant text between tool calls, summarizes it, and returns // the summary with a signature so the original can be restored on subsequent // turns — same mechanism as thinking blocks. Ant-only while we measure // TTFT/TTLT/capacity; betas already flow to tengu_api_success for splitting. if ( SUMMARIZE_CONNECTOR_TEXT_BETA_HEADER && process.env.USER_TYPE === 'ant' && includeFirstPartyOnlyBetas && !isEnvDefinedFalsy(process.env.USE_CONNECTOR_TEXT_SUMMARIZATION) && (isEnvTruthy(process.env.USE_CONNECTOR_TEXT_SUMMARIZATION) || getFeatureValue_CACHED_MAY_BE_STALE('tengu_slate_prism', false)) ) { betaHeaders.push(SUMMARIZE_CONNECTOR_TEXT_BETA_HEADER) }
This mechanism reuses the same signature infrastructure as thinking blocks. In the Anthropic API, extended thinking content is signed so that it can be verified and restored. Connector text piggybacks on this system:
Signatures are bound to the API key that generated them. When a user runs /login to switch accounts, Claude Code calls stripSignatureBlocks() to remove all connector_text and thinking blocks — because the stale signatures would be rejected by the API with a 400 error under the new key.
The third layer is a client-side defense applied to the SDK output stream. When Claude Code operates in "streamlined mode" (e.g., when used through the Agent SDK), the output transformer replaces detailed tool call information with opaque summaries.
/**
* Streamlined mode is a "distillation-resistant" output format that:
* - Keeps text messages intact
* - Summarizes tool calls with cumulative counts
* - Omits thinking content
* - Strips tool list and model info from init messages
*/
Instead of seeing the exact tool names, parameters, and results, an SDK consumer receives aggregated summaries like:
// What the model actually did: Grep("handleError", "src/**/*.ts") // 3 results Read("src/utils/error.ts") // 142 lines Read("src/services/api.ts") // 89 lines Bash("npm test") // exit 0 // What streamlined mode emits: { type: "streamlined_tool_use_summary", tool_summary: "Searched 1 pattern, read 2 files, ran 1 command" }
The transformer categorizes all tools into five buckets — searches (Grep, Glob, WebSearch, LSP), reads (FileRead, ListMcpResources), writes (FileWrite, FileEdit, NotebookEdit), commands (shell, Tmux, TaskStop), and other. Counts accumulate across consecutive tool-only messages and reset when text content appears.
This means an attacker observing the SDK stream cannot reconstruct which files were read, what search patterns were used, or what commands were run — only the aggregate counts per category.
All three defenses are gated behind a dual feature flag system — compile-time flags baked into the binary and runtime flags served remotely via GrowthBook. This creates a two-key system where both must agree before any defense activates.
// Stub for bun:bundle feature() function // All features return false — matches external/public build behavior export function feature(_name: string): boolean { return false }
In the public npm-distributed build, the feature() function is this stub that always returns false. In Anthropic's internal Bun-compiled builds, the real bun:bundle module resolves feature flags at compile time, dead-code eliminating entire code paths when flags are off.
| Defense Layer | Compile Flag | Runtime Flag | Extra Gate |
|---|---|---|---|
| Fake Tools | ANTI_DISTILLATION_CC | tengu_anti_distill_fake_tool_injection | CLI entrypoint + 1P provider |
| Connector Text | CONNECTOR_TEXT | tengu_slate_prism | USER_TYPE=ant + 1P provider |
| Streamlined Output | N/A (always available) | N/A | SDK streamlined mode |
Putting it all together: how a single API round-trip is protected at every stage.
anti_distillation: ['fake_tools'] in the request body, signaling the server to activate fake tool injection.For distillation attackers: Recording Claude Code API traffic for training data is now significantly degraded. Fake tools poison the training signal, summarized connector text hides the reasoning chain, and streamlined output removes tool specifics. A student model trained on this data would generate calls to nonexistent tools, lack intermediate reasoning capabilities, and have no understanding of actual tool schemas.
For the open-source community: Since all defenses are compile-gated behind feature() stubs that return false, the public Claude Code build is completely unaffected. Users running the npm package or building from source get no anti-distillation overhead. The defense code is visible for analysis but inert.
For Anthropic's internal users: The connector text summarization is currently gated to USER_TYPE=ant (Anthropic employees), suggesting it's still in the measurement/POC phase. The code comments reference monitoring TTFT (time to first token) and TTLT (time to last token), indicating Anthropic is evaluating the performance cost of server-side summarization before broader rollout.
The internal codenames follow a pattern: tengu_ prefix for GrowthBook flags (tengu appears to be Claude Code's internal project name), slate_prism for the connector text experiment, and fake_tool_injection for the tool poisoning experiment. These names appear in analytics logging paths like tengu_api_success for A/B test analysis.