Security Deep Dive

Claude Code's
Anti-Distillation
System

A source code analysis of the three-layer defense system Claude Code uses to prevent model distillation — fake tool injection, connector text summarization, and streamlined output transforms.

April 1, 2026 · 12 min read
Anti-Distillation Claude Code Source Analysis Model Security
Overview
What Is Anti-Distillation?

Knowledge distillation is a technique where a smaller "student" model learns to replicate the behavior of a larger "teacher" model by training on the teacher's outputs. In the context of commercial LLMs, this means a competitor could systematically record API interactions — inputs and outputs — to train a cheaper model that approximates the original.

Claude Code contains a three-layer anti-distillation system designed to make such recording useless. By poisoning tool definitions, summarizing intermediate reasoning, and obfuscating SDK output, the system attacks distillation at every stage of the pipeline: request, response, and client output.

Important Note

All three mechanisms are disabled in the public open-source build. The feature() function in the published code (src/_stubs/bun-bundle.ts) always returns false. These defenses only activate in Anthropic's internal builds where the real bun:bundle feature flags resolve to true.

Architecture
Three Layers of Defense

Each layer operates at a different stage of the API lifecycle, creating defense in depth against distillation attempts.

Fake Tool Injection

Request Side

Server injects fake tool definitions into API requests. Distilled models learn nonexistent tools, poisoning training data.

Connector Text

Response Side

Server summarizes intermediate reasoning between tool calls and signs it cryptographically. Distillers see only summaries.

Streamlined Output

Client Side

SDK output strips tool details to vague summaries. Observers cannot recover exact tool calls or parameters.

Layer 1
Fake Tool Injection

The first layer operates on the API request side. When enabled, Claude Code sends anti_distillation: ['fake_tools'] in the request body. This instructs the Anthropic API server to inject fabricated tool definitions alongside the real ones.

If an attacker records these API interactions for distillation training, their student model will learn to call tools that don't actually exist. The model becomes "poisoned" — it generates tool calls for phantom tools, producing broken outputs that are impossible to debug without understanding the defense.

src/services/api/claude.ts:301-313
// Anti-distillation: send fake_tools opt-in for 1P CLI only
if (
  feature('ANTI_DISTILLATION_CC')
    ? process.env.CLAUDE_CODE_ENTRYPOINT === 'cli' &&
      shouldIncludeFirstPartyOnlyBetas() &&
      getFeatureValue_CACHED_MAY_BE_STALE(
        'tengu_anti_distill_fake_tool_injection',
        false,
      )
    : false
) {
  result.anti_distillation = ['fake_tools']
}

The activation requires all four conditions to be true simultaneously:

Compile-time Flag
feature('ANTI_DISTILLATION_CC')
Entry Point
CLI only
Provider
firstParty / foundry
Remote Flag
tengu_anti_distill_fake_tool_injection
How It Works Server-Side

The actual fake tool generation happens on Anthropic's API servers, not in Claude Code itself. The client merely sends the opt-in signal. This means the fake tool definitions are dynamically generated and never appear in the open-source code — making them impossible to filter out by static analysis.

Layer 2
Connector Text Summarization

The second layer targets the API response. When Claude Code makes agentic tool calls, the model generates intermediate text between tool invocations — reasoning about what to do next, analyzing tool results, etc. This "connector text" is the most valuable training signal for distillation.

With this defense enabled, the Anthropic API server buffers this connector text, replaces it with a summary, and attaches a cryptographic signature. The signature allows the server to restore the original text on subsequent turns, maintaining Claude's reasoning context while hiding it from external observers.

src/utils/betas.ts:279-297
// POC: server-side connector-text summarization (anti-distillation). The
// API buffers assistant text between tool calls, summarizes it, and returns
// the summary with a signature so the original can be restored on subsequent
// turns — same mechanism as thinking blocks. Ant-only while we measure
// TTFT/TTLT/capacity; betas already flow to tengu_api_success for splitting.

if (
  SUMMARIZE_CONNECTOR_TEXT_BETA_HEADER &&
  process.env.USER_TYPE === 'ant' &&
  includeFirstPartyOnlyBetas &&
  !isEnvDefinedFalsy(process.env.USE_CONNECTOR_TEXT_SUMMARIZATION) &&
  (isEnvTruthy(process.env.USE_CONNECTOR_TEXT_SUMMARIZATION) ||
    getFeatureValue_CACHED_MAY_BE_STALE('tengu_slate_prism', false))
) {
  betaHeaders.push(SUMMARIZE_CONNECTOR_TEXT_BETA_HEADER)
}

This mechanism reuses the same signature infrastructure as thinking blocks. In the Anthropic API, extended thinking content is signed so that it can be verified and restored. Connector text piggybacks on this system:

1
Model generates reasoning text
Full intermediate reasoning between tool calls (e.g., "The grep output shows 3 matches in utils.ts, I should read that file next...")
2
Server summarizes + signs
API server replaces full text with a compressed summary and attaches a cryptographic signature bound to the API key.
{ type: "connector_text", connector_text: "Found matches, reading file...", signature: "sig_abc..." }
3
Client displays summary
Claude Code renders the summarized text to the user. The full reasoning is hidden.
4
Server restores on next turn
When the signed block is sent back, the server verifies the signature and restores the original full text for model context continuity.
Signature-Key Binding

Signatures are bound to the API key that generated them. When a user runs /login to switch accounts, Claude Code calls stripSignatureBlocks() to remove all connector_text and thinking blocks — because the stale signatures would be rejected by the API with a 400 error under the new key.

Layer 3
Streamlined Output Transform

The third layer is a client-side defense applied to the SDK output stream. When Claude Code operates in "streamlined mode" (e.g., when used through the Agent SDK), the output transformer replaces detailed tool call information with opaque summaries.

src/utils/streamlinedTransform.ts:1-9
/**
 * Streamlined mode is a "distillation-resistant" output format that:
 * - Keeps text messages intact
 * - Summarizes tool calls with cumulative counts
 * - Omits thinking content
 * - Strips tool list and model info from init messages
 */

Instead of seeing the exact tool names, parameters, and results, an SDK consumer receives aggregated summaries like:

Original vs. Streamlined output
// What the model actually did:
Grep("handleError", "src/**/*.ts")  // 3 results
Read("src/utils/error.ts")              // 142 lines
Read("src/services/api.ts")             // 89 lines
Bash("npm test")                        // exit 0

// What streamlined mode emits:
{ type: "streamlined_tool_use_summary",
  tool_summary: "Searched 1 pattern, read 2 files, ran 1 command" }

The transformer categorizes all tools into five buckets — searches (Grep, Glob, WebSearch, LSP), reads (FileRead, ListMcpResources), writes (FileWrite, FileEdit, NotebookEdit), commands (shell, Tmux, TaskStop), and other. Counts accumulate across consecutive tool-only messages and reset when text content appears.

This means an attacker observing the SDK stream cannot reconstruct which files were read, what search patterns were used, or what commands were run — only the aggregate counts per category.

Control
Feature Flag Architecture

All three defenses are gated behind a dual feature flag system — compile-time flags baked into the binary and runtime flags served remotely via GrowthBook. This creates a two-key system where both must agree before any defense activates.

src/_stubs/bun-bundle.ts (public build)
// Stub for bun:bundle feature() function
// All features return false — matches external/public build behavior
export function feature(_name: string): boolean {
  return false
}

In the public npm-distributed build, the feature() function is this stub that always returns false. In Anthropic's internal Bun-compiled builds, the real bun:bundle module resolves feature flags at compile time, dead-code eliminating entire code paths when flags are off.

Defense LayerCompile FlagRuntime FlagExtra Gate
Fake Tools ANTI_DISTILLATION_CC tengu_anti_distill_fake_tool_injection CLI entrypoint + 1P provider
Connector Text CONNECTOR_TEXT tengu_slate_prism USER_TYPE=ant + 1P provider
Streamlined Output N/A (always available) N/A SDK streamlined mode
Summary
Complete Data Flow

Putting it all together: how a single API round-trip is protected at every stage.

1. Client sends request
Claude Code includes anti_distillation: ['fake_tools'] in the request body, signaling the server to activate fake tool injection.
2. Server injects fake tools
The API server adds fabricated tool definitions to the tool list before passing to the model. The model may reference these tools, poisoning any recorded output.
3. Model generates response
Claude generates reasoning text, tool calls, and results. Between tool calls, intermediate reasoning is produced as connector text.
4. Server summarizes connector text
Intermediate reasoning is replaced with signed summaries. Full text is only recoverable by the original API key holder on the next turn.
5. Client transforms SDK output
If in streamlined mode, the client further reduces tool calls to aggregate counts, stripping all parameter details and tool names.
Implications
What This Means

For distillation attackers: Recording Claude Code API traffic for training data is now significantly degraded. Fake tools poison the training signal, summarized connector text hides the reasoning chain, and streamlined output removes tool specifics. A student model trained on this data would generate calls to nonexistent tools, lack intermediate reasoning capabilities, and have no understanding of actual tool schemas.

For the open-source community: Since all defenses are compile-gated behind feature() stubs that return false, the public Claude Code build is completely unaffected. Users running the npm package or building from source get no anti-distillation overhead. The defense code is visible for analysis but inert.

For Anthropic's internal users: The connector text summarization is currently gated to USER_TYPE=ant (Anthropic employees), suggesting it's still in the measurement/POC phase. The code comments reference monitoring TTFT (time to first token) and TTLT (time to last token), indicating Anthropic is evaluating the performance cost of server-side summarization before broader rollout.

The Naming Convention

The internal codenames follow a pattern: tengu_ prefix for GrowthBook flags (tengu appears to be Claude Code's internal project name), slate_prism for the connector text experiment, and fake_tool_injection for the tool poisoning experiment. These names appear in analytics logging paths like tengu_api_success for A/B test analysis.