Security Deep Dive

Claude Code's
Anti-Distillation
System

A source code analysis of the three-layer defense system Claude Code uses to prevent model distillation — fake tool injection, connector text summarization, and streamlined output transforms.

April 1, 2026 · 12 min read

Anti-Distillation Claude Code Source Analysis Model Security

Overview

What Is Anti-Distillation?

Knowledge distillation is a technique where a smaller "student" model learns to replicate the behavior of a larger "teacher" model by training on the teacher's outputs. In the context of commercial LLMs, this means a competitor could systematically record API interactions — inputs and outputs — to train a cheaper model that approximates the original.

Claude Code contains a three-layer anti-distillation system designed to make such recording useless. By poisoning tool definitions, summarizing intermediate reasoning, and obfuscating SDK output, the system attacks distillation at every stage of the pipeline: request, response, and client output.

Important Note

All three mechanisms are disabled in the public open-source build. The feature() function in the published code (src/_stubs/bun-bundle.ts) always returns false. These defenses only activate in Anthropic's internal builds where the real bun:bundle feature flags resolve to true.

Architecture

Three Layers of Defense

Each layer operates at a different stage of the API lifecycle, creating defense in depth against distillation attempts.

⚠

Fake Tool Injection

Request Side

Server injects fake tool definitions into API requests. Distilled models learn nonexistent tools, poisoning training data.

✎

Connector Text

Response Side

Server summarizes intermediate reasoning between tool calls and signs it cryptographically. Distillers see only summaries.

◗

Streamlined Output

Client Side

SDK output strips tool details to vague summaries. Observers cannot recover exact tool calls or parameters.

Layer 1

Fake Tool Injection

The first layer operates on the API request side. When enabled, Claude Code sends anti_distillation: ['fake_tools'] in the request body. This instructs the Anthropic API server to inject fabricated tool definitions alongside the real ones.

If an attacker records these API interactions for distillation training, their student model will learn to call tools that don't actually exist. The model becomes "poisoned" — it generates tool calls for phantom tools, producing broken outputs that are impossible to debug without understanding the defense.

src/services/api/claude.ts:301-313

// Anti-distillation: send fake_tools opt-in for 1P CLI only
if (
  feature('ANTI_DISTILLATION_CC')
    ? process.env.CLAUDE_CODE_ENTRYPOINT === 'cli' &&
      shouldIncludeFirstPartyOnlyBetas() &&
      getFeatureValue_CACHED_MAY_BE_STALE(
        'tengu_anti_distill_fake_tool_injection',
        false,
      )
    : false
) {
  result.anti_distillation = ['fake_tools']
}

The activation requires all four conditions to be true simultaneously:

Compile-time Flag

feature('ANTI_DISTILLATION_CC')

Entry Point

CLI only

Provider

firstParty / foundry

Remote Flag

tengu_anti_distill_fake_tool_injection

How It Works Server-Side

The actual fake tool generation happens on Anthropic's API servers, not in Claude Code itself. The client merely sends the opt-in signal. This means the fake tool definitions are dynamically generated and never appear in the open-source code — making them impossible to filter out by static analysis.

Layer 2

Connector Text Summarization

The second layer targets the API response. When Claude Code makes agentic tool calls, the model generates intermediate text between tool invocations — reasoning about what to do next, analyzing tool results, etc. This "connector text" is the most valuable training signal for distillation.

With this defense enabled, the Anthropic API server buffers this connector text, replaces it with a summary, and attaches a cryptographic signature. The signature allows the server to restore the original text on subsequent turns, maintaining Claude's reasoning context while hiding it from external observers.

src/utils/betas.ts:279-297

// POC: server-side connector-text summarization (anti-distillation). The
// API buffers assistant text between tool calls, summarizes it, and returns
// the summary with a signature so the original can be restored on subsequent
// turns — same mechanism as thinking blocks. Ant-only while we measure
// TTFT/TTLT/capacity; betas already flow to tengu_api_success for splitting.

if (
  SUMMARIZE_CONNECTOR_TEXT_BETA_HEADER &&
  process.env.USER_TYPE === 'ant' &&
  includeFirstPartyOnlyBetas &&
  !isEnvDefinedFalsy(process.env.USE_CONNECTOR_TEXT_SUMMARIZATION) &&
  (isEnvTruthy(process.env.USE_CONNECTOR_TEXT_SUMMARIZATION) ||
    getFeatureValue_CACHED_MAY_BE_STALE('tengu_slate_prism', false))
) {
  betaHeaders.push(SUMMARIZE_CONNECTOR_TEXT_BETA_HEADER)
}

This mechanism reuses the same signature infrastructure as thinking blocks. In the Anthropic API, extended thinking content is signed so that it can be verified and restored. Connector text piggybacks on this system:

Model generates reasoning text

Full intermediate reasoning between tool calls (e.g., "The grep output shows 3 matches in utils.ts, I should read that file next...")

Server summarizes + signs

API server replaces full text with a compressed summary and attaches a cryptographic signature bound to the API key.

{ type: "connector_text", connector_text: "Found matches, reading file...", signature: "sig_abc..." }

Client displays summary

Claude Code renders the summarized text to the user. The full reasoning is hidden.

Server restores on next turn

When the signed block is sent back, the server verifies the signature and restores the original full text for model context continuity.

Signature-Key Binding

Signatures are bound to the API key that generated them. When a user runs /login to switch accounts, Claude Code calls stripSignatureBlocks() to remove all connector_text and thinking blocks — because the stale signatures would be rejected by the API with a 400 error under the new key.

Layer 3

Streamlined Output Transform

The third layer is a client-side defense applied to the SDK output stream. When Claude Code operates in "streamlined mode" (e.g., when used through the Agent SDK), the output transformer replaces detailed tool call information with opaque summaries.

src/utils/streamlinedTransform.ts:1-9

/**
 * Streamlined mode is a "distillation-resistant" output format that:
 * - Keeps text messages intact
 * - Summarizes tool calls with cumulative counts
 * - Omits thinking content
 * - Strips tool list and model info from init messages
 */

Instead of seeing the exact tool names, parameters, and results, an SDK consumer receives aggregated summaries like:

Original vs. Streamlined output

// What the model actually did:
Grep("handleError", "src/**/*.ts")  // 3 results
Read("src/utils/error.ts")              // 142 lines
Read("src/services/api.ts")             // 89 lines
Bash("npm test")                        // exit 0

// What streamlined mode emits:
{ type: "streamlined_tool_use_summary",
  tool_summary: "Searched 1 pattern, read 2 files, ran 1 command" }

The transformer categorizes all tools into five buckets — searches (Grep, Glob, WebSearch, LSP), reads (FileRead, ListMcpResources), writes (FileWrite, FileEdit, NotebookEdit), commands (shell, Tmux, TaskStop), and other. Counts accumulate across consecutive tool-only messages and reset when text content appears.

This means an attacker observing the SDK stream cannot reconstruct which files were read, what search patterns were used, or what commands were run — only the aggregate counts per category.

Control

Feature Flag Architecture

All three defenses are gated behind a dual feature flag system — compile-time flags baked into the binary and runtime flags served remotely via GrowthBook. This creates a two-key system where both must agree before any defense activates.

src/_stubs/bun-bundle.ts (public build)

// Stub for bun:bundle feature() function
// All features return false — matches external/public build behavior
export function feature(_name: string): boolean {
  return false
}

In the public npm-distributed build, the feature() function is this stub that always returns false. In Anthropic's internal Bun-compiled builds, the real bun:bundle module resolves feature flags at compile time, dead-code eliminating entire code paths when flags are off.

Defense Layer	Compile Flag	Runtime Flag	Extra Gate
Fake Tools	ANTI_DISTILLATION_CC	tengu_anti_distill_fake_tool_injection	CLI entrypoint + 1P provider
Connector Text	CONNECTOR_TEXT	tengu_slate_prism	USER_TYPE=ant + 1P provider
Streamlined Output	N/A (always available)	N/A	SDK streamlined mode

Summary

Complete Data Flow

Putting it all together: how a single API round-trip is protected at every stage.

✎

1. Client sends request

Claude Code includes anti_distillation: ['fake_tools'] in the request body, signaling the server to activate fake tool injection.

⚠

2. Server injects fake tools

The API server adds fabricated tool definitions to the tool list before passing to the model. The model may reference these tools, poisoning any recorded output.

◈

3. Model generates response

Claude generates reasoning text, tool calls, and results. Between tool calls, intermediate reasoning is produced as connector text.

✎

4. Server summarizes connector text

Intermediate reasoning is replaced with signed summaries. Full text is only recoverable by the original API key holder on the next turn.

◗

5. Client transforms SDK output

If in streamlined mode, the client further reduces tool calls to aggregate counts, stripping all parameter details and tool names.

Implications

What This Means

For distillation attackers: Recording Claude Code API traffic for training data is now significantly degraded. Fake tools poison the training signal, summarized connector text hides the reasoning chain, and streamlined output removes tool specifics. A student model trained on this data would generate calls to nonexistent tools, lack intermediate reasoning capabilities, and have no understanding of actual tool schemas.

For the open-source community: Since all defenses are compile-gated behind feature() stubs that return false, the public Claude Code build is completely unaffected. Users running the npm package or building from source get no anti-distillation overhead. The defense code is visible for analysis but inert.

For Anthropic's internal users: The connector text summarization is currently gated to USER_TYPE=ant (Anthropic employees), suggesting it's still in the measurement/POC phase. The code comments reference monitoring TTFT (time to first token) and TTLT (time to last token), indicating Anthropic is evaluating the performance cost of server-side summarization before broader rollout.

The Naming Convention

The internal codenames follow a pattern: tengu_ prefix for GrowthBook flags (tengu appears to be Claude Code's internal project name), slate_prism for the connector text experiment, and fake_tool_injection for the tool poisoning experiment. These names appear in analytics logging paths like tengu_api_success for A/B test analysis.

Claude Code'sAnti-DistillationSystem

Fake Tool Injection

Connector Text

Streamlined Output

Claude Code's
Anti-Distillation
System