Claude Code Token Usage Is a Context Management Problem

Why Claude Code Uses So Many Tokens

Most developers assume Claude Code token usage comes from long prompts.

That is only part of the problem.

Claude Code does not just read the message you type. It also works with previous messages, opened files, terminal output, test logs, project instructions, MCP tools, compacted summaries, and reasoning steps.

So even a short prompt can become expensive.

For example:

Fix the auth issue.

This looks small, but Claude may need to inspect routes, middleware, session helpers, auth files, tests, and old command output before it understands the task.

That is why token usage is really a context problem.

Claude Code becomes expensive when it has to think through too much unnecessary information. The cleaner the context, the easier it is for Claude to work faster, stay focused, and use fewer tokens.

Token usage grows when sessions carry too much irrelevant context. The fix is cleaner context management, not just shorter prompts.

What Context Management Means in Claude Code

Context management means controlling what Claude Code can see, remember, read, and carry forward while working on a task.

In simple terms, context is Claude Code's working memory.

That memory can include:

Your prompt
Earlier messages
Files Claude has opened
CLAUDE.md
Terminal output
Test logs
MCP tool definitions
Search results
Previous plans
Failed attempts
Compacted summaries

The more unnecessary information inside that memory, the more tokens Claude Code may use.

Bad context looks like this:

A long session with three unrelated tasks, full CI logs, old failed attempts, repeated instructions, and files that no longer matter.

Good context looks like this:

A focused session with one task, clear scope, relevant files, filtered errors, and a specific verification target.

Bloated session vs focused session — what fills Claude's working memory and the token cost difference — A bloated session costs ~186k tokens. A focused session with one task, filtered errors, and clear scope costs ~28k tokens.

This is why token optimization is not only about writing less.

It is about giving Claude Code the right information at the right time and keeping everything else out.

Why Context Management Also Improves Output Quality

Reducing Claude Code token usage is not only about cost.

It also improves reliability.

When Claude Code has too much context, it may:

Reference old decisions
Follow outdated instructions
Reprocess failed attempts
Inspect unrelated files
Spend time on unnecessary tools
Lose focus after long sessions
Make changes outside the real scope

Cleaner context usually means fewer wrong edits, faster responses, and better engineering judgment.

That is why the best Claude Code workflows treat context like an engineering resource.

It should be small, relevant, current, and tied to the task.

Practical Ways to Reduce Claude Code Token Usage

1. Scope the Task Before Claude Starts Searching

The fastest way to waste tokens is to give Claude Code a vague task.

Weak prompt:

Improve the dashboard.

This can make Claude inspect too many files because it does not know what "improve" means.

A better prompt:

Fix the tooltip alignment issue in src/features/dashboard/RevenueChart.tsx.

Scope:
- Inspect only nearby chart components if needed.
- Do not refactor unrelated dashboard files.
- Do not change API logic.

Verification:
- Run the existing chart test if available.

This prompt is longer, but it is cheaper in practice because it narrows the search space.

A good Claude Code task should define:

The exact file, folder, or feature area
What needs to change
What should not change
How success will be verified

The goal is not a shorter prompt.

The goal is a smaller search space.

2. Keep `CLAUDE.md` Useful, Not Bloated

CLAUDE.md is one of the most useful parts of Claude Code, but it can quietly increase token usage if it becomes too large.

Use it only for rules Claude needs in most sessions.

Good CLAUDE.md content:

# Project rules

- Use pnpm, not npm.
- Run pnpm typecheck before final response.
- Use existing UI components from src/components/ui.
- Do not edit migration files unless explicitly asked.
- Do not change auth, billing, or permissions logic without explaining the risk first.

Avoid adding:

Product history
Meeting notes
Full API documentation
Old debugging notes
Client-specific instructions
One-time workflows
Long implementation plans
Generic advice like "write clean code"

The practical rule:

If Claude needs it in almost every session, keep it in CLAUDE.md. If it is only for PR reviews, migrations, releases, or audits, move it into a separate skill or workflow file.

This keeps the default context small while preserving specialized knowledge when needed.

3. Use `/usage`, `/context`, `/compact`, and `/clear` Intentionally

Many developers let Claude Code sessions run for too long.

That creates stale context.

Use:

/usage

to check token usage.

Use:

/context

to see what is filling the context window.

Use:

/compact

when the task is still active but the session has become heavy.

But do not compact without instructions.

Weak compaction:

/compact

Better compaction:

/compact Preserve only:
- Current task goal
- Files changed
- Key decisions
- Test results
- Remaining issues

Remove:
- Old logs
- Failed attempts
- Unrelated exploration
- Repeated explanations

Four Claude Code session commands: /usage to monitor, /context to inspect, /compact to trim, /clear to reset — Use /usage, /context, /compact, and /clear intentionally — not reflexively. The workflow: finish task → save summary → /clear → start next task.

Use:

/clear

when the task is complete or when you are switching to unrelated work.

For example, do not use the same session for:

Fix login validation.
Rewrite pricing page copy.
Debug deployment logs.
Review database migration.

Each task carries different context.

A simple workflow:

Finish task → Save short summary → /clear → Start next task

This prevents old decisions and noisy logs from leaking into the next task.

4. Filter Logs Before Claude Sees Them

Raw logs are one of the biggest sources of token waste.

Do not paste 5,000 lines of CI output into Claude Code.

Give Claude the relevant failure.

Instead of:

[paste full CI log]

Use:

Command:
pnpm test LoginForm.test.tsx

Error:
Expected: "Invalid email"
Received: "Email is required"

Failing file:
src/auth/LoginForm.test.tsx

Recently changed file:
src/auth/LoginForm.tsx

Please inspect only the validation flow.

For repeated debugging, filter output first:

pnpm test 2>&1 | grep -A 8 -E "FAIL|ERROR|Expected|Received" | head -120

Raw CI output with 5000 lines vs filtered signal with 12 lines — filter logs before Claude sees them — Raw CI logs (5,000 lines) vs. filtered signal (12 lines). Give Claude the signal, not the noise.

This gives Claude the signal without making it process the noise.

The same applies to:

Build logs
Deployment logs
Server traces
Test output
Lint results
Browser console output

Claude does not need every line.

It needs the lines that explain the failure.

5. Control MCP Tools and Use CLI When Enough

MCP tools are powerful, but they should not be available by default for every task.

If Claude has access to GitHub, Slack, Sentry, Linear, databases, cloud tools, and internal docs during a small UI fix, the session can become unnecessarily heavy.

For a focused task, say:

For this task:
- Use local files only.
- Do not use MCP tools.
- Do not inspect GitHub, Slack, Sentry, or production logs.
- Run only the nearest frontend test if needed.

Use MCP when Claude needs rich tool interaction.

Use CLI when a narrow command is enough.

For example:

gh pr view 123

is often cleaner than exposing a large GitHub tool surface.

Similarly:

gcloud run services describe my-service --region us-central1

may be enough for a deployment check.

The practical rule:

Do not connect every tool to every session. Give Claude only the tools needed for the task.

6. Match Model, Effort, and Subagents to the Task

Not every Claude Code task needs the strongest model, highest effort, or a subagent.

Use normal settings for:

UI fixes
Small bugs
Tests
Copy changes
Simple refactors
Documentation updates

Use stronger reasoning for:

Architecture decisions
Security-sensitive logic
Production incidents
Multi-service debugging
Complex migrations

A practical default:

/model sonnet
/effort medium

Use heavier settings only when the risk justifies the cost.

Subagents should also be used carefully.

Use subagents for noisy or broad investigation:

Use a subagent to inspect the payment webhook flow.

Return only:
- Relevant files
- Current flow summary
- Risky edge cases
- Tests that should be run

Do not modify files.

Avoid subagents for small edits, simple renames, formatting, or one-file changes.

The goal is to use more reasoning only when it creates real value.

7. Store Durable Context Outside the Chat

Many developers keep long Claude Code sessions open because they do not want to lose progress.

That creates expensive context.

A better approach is to save useful context in small project files.

Create:

docs/agent-context/current-task.md
docs/agent-context/decisions.md
docs/agent-context/test-notes.md
docs/agent-context/next-steps.md

Example:

# Current Task

Goal:
Fix login validation and add targeted tests.

Files:
- src/auth/LoginForm.tsx
- src/auth/validation.ts
- src/auth/LoginForm.test.tsx

Constraints:
- Do not change API client.
- Reuse existing error UI.
- Do not introduce new dependencies.

Verification:
- pnpm test LoginForm.test.tsx
- pnpm typecheck

Then start a clean session:

Read docs/agent-context/current-task.md first.
Follow only the scope defined there.
Do not inspect unrelated auth files unless needed.

This gives Claude the context it needs without dragging the entire old conversation forward.

A Lean Claude Code Workflow Developers Can Copy

Start each task with this structure:

Task:
Fix [specific issue].

Scope:
- Work only in [files/folders].
- Do not touch [sensitive areas].
- Avoid unrelated refactors.

Verification:
- Run [specific test].
- Run [typecheck/lint if needed].

During the session:

/usage
/context

If the task is still active but the session is large:

/compact Preserve the goal, changed files, decisions, test results, and remaining issues. Remove old logs and unrelated exploration.

If the task is complete:

Write a short summary into docs/agent-context/current-task.md, then use /clear.

For noisy debugging:

pnpm test 2>&1 | grep -A 8 -E "FAIL|ERROR|Expected|Received" | head -120

For tool-heavy work:

Use only the tools required for this task. Prefer CLI commands when they are enough.

The lean Claude Code workflow: six principles — scope, CLAUDE.md, commands, logs, tools, model — context is an engineering resource — The lean Claude Code workflow: treat context like an engineering resource. Keep it small, relevant, current, and task-tied.

Lower Token Usage Should Not Mean Lower Engineering Quality

Bad token optimization makes Claude under-informed.

Good token optimization keeps Claude focused.

Do not reduce token usage by hiding requirements, skipping verification, disabling reasoning for complex work, or forcing Claude to edit code without enough context.

Reduce token usage by removing noise:

Vague prompts
Long stale sessions
Oversized CLAUDE.md
Raw logs
Unused MCP tools
Unnecessary subagents
Old failed attempts
Unrelated file exploration

Claude Code is powerful because it can work across files, tools, commands, and development workflows.

That power becomes expensive when context is unmanaged.

The practical way to reduce Claude Code token usage is to treat context like an engineering resource: keep it small, relevant, current, and tied to the task at hand.

Claude Code Token Usage Is a Context Management Problem

Why Claude Code Uses So Many Tokens

What Context Management Means in Claude Code

Why Context Management Also Improves Output Quality

Practical Ways to Reduce Claude Code Token Usage

1. Scope the Task Before Claude Starts Searching

2. Keep CLAUDE.md Useful, Not Bloated

3. Use /usage, /context, /compact, and /clear Intentionally

4. Filter Logs Before Claude Sees Them

5. Control MCP Tools and Use CLI When Enough

6. Match Model, Effort, and Subagents to the Task

7. Store Durable Context Outside the Chat

A Lean Claude Code Workflow Developers Can Copy

Lower Token Usage Should Not Mean Lower Engineering Quality

Want to Build AI Agents for Your SaaS?

2. Keep `CLAUDE.md` Useful, Not Bloated

3. Use `/usage`, `/context`, `/compact`, and `/clear` Intentionally