Stop Burning Tokens: A Practical Guide to Using Claude and Claude Code Efficiently

Claude and Claude Code can dramatically improve the way business users, analysts, architects, product managers, and software developers work. They can write documents, analyze requirements, review code, design systems, automate repetitive tasks, and help teams move faster. But there is a hidden operational discipline behind effective usage: token management.

Most people do not fail with Claude because they write “bad prompts.” They fail because they unintentionally create expensive conversations. They paste too much context. They keep long sessions alive after the useful work is finished. They attach large documents when a brief excerpt would suffice. They ask Claude Code to “look around the repo” instead of pointing it to the right files. They let MCP servers, plugins, skills, subagents, and memory files accumulate until every message carries unnecessary baggage. Then they are surprised when they hit usage limits, consume too much quota, or see API costs grow faster than expected.

This article explains how to use Claude and Claude Code effectively from a token-usage perspective. It is written for both business users and software developers. Business users need to understand how conversations, files, documents, and repetitive work consume tokens. Developers need to understand how Claude Code uses context, how codebase exploration becomes expensive, how model choice affects usage, and how to structure sessions so the model spends its budget on reasoning and implementation rather than on re-reading irrelevant history.

1. The core idea: tokens are not just what you type

The first mental model to understand is that token usage is not limited to the words in your latest prompt.

A token is a small unit of text processed by the model. In practice, a token can be part of a word, a whole word, punctuation, whitespace, code syntax, or structured data. For everyday understanding, it is enough to think of tokens as the “text units” Claude reads and writes.

The mistake many users make is assuming that a short prompt is always cheap. It is not. In a long conversation, Claude not only processes your latest message. It also needs relevant conversation history, system instructions, tool definitions, memory files, attached content, previous outputs, and sometimes other context loaded by the application or development environment. This is why a simple follow-up question in a long session can cost much more than the same question in a fresh session.

For business users, this means a conversation about a strategy document, proposal, contract, or report can become expensive if it contains many previous drafts, attachments, rewrites, comments, and side discussions. For developers, this means a Claude Code session can become expensive because the model may carry previous file reads, command outputs, logs, diffs, test failures, architectural discussion, and implementation attempts into later turns.

The practical lesson is simple: token usage compounds with context. The longer and noisier the context, the more every future message costs.

Claude Code already includes cost-management features such as prompt caching, auto-compaction, usage reporting, model selection, context inspection, and background summarisation. These features help, but they do not remove the need for disciplined workflow design. The best users treat context as a working set, not as an infinite notebook.

2. Why token discipline matters for business users

Business users often experience token waste differently from developers do. They may not see a terminal or token counter. They simply notice that the assistant becomes slower, less focused, or hits usage limits. The root causes are usually predictable.

The first cause is large attachments. A PDF, Word document, spreadsheet, slide deck, screenshot, or exported web page may contain far more hidden content than the user expects. A document may include metadata, formatting, tables, repeated headers, footers, comments, images, and irrelevant sections. When users upload the whole file and ask a narrow question, Claude may have to process far more information than the task requires.

The second cause is repeated rewriting. Business users often ask for “make it better,” “make it more executive,” “make it shorter,” “now make it more formal,” “now add more detail,” and so on. Each iteration may carry the full previous conversation and previous drafts. If the user asks Claude to rewrite the entire document every time, output tokens grow quickly. A better approach is to work section by section and ask for targeted changes.

The third cause is unclear scope. A vague request such as “analyze this business case” or “review this strategy” encourages the model to read widely, infer context, and produce broad commentary. A precise request, such as “review only the executive summary for clarity, decision logic, and missing financial assumptions,” is usually cheaper and better.

The fourth cause is unnecessary politeness and filler. This does not mean users should be rude. It means they should avoid long ritual prompts full of non-functional text. Claude does not need two paragraphs of ceremony before every instruction. In a long session, repeated fillers add up.

For business users, token-efficient prompting usually means:

  • Provide the minimum context required for the decision.
  • Identify the exact output format.
  • Specify what not to rewrite.
  • Ask for changes to a section rather than regenerating the whole document.
  • Start a new conversation when the topic changes.
  • Summarise or extract only the relevant parts of large files before requesting analysis.
  • Avoid keeping old drafts, side discussions, and unrelated decisions in the same chat.

This is not only about cost. It improves quality. Claude performs better when the important signal is not buried inside an irrelevant context.

3. Why token discipline matters even more in Claude Code

Claude Code is more powerful than a normal chat session because it can work with your repository, read files, run commands, edit code, analyze errors, and iterate. That power also creates more ways to spend tokens.

A software development session may include:

  • project instructions from CLAUDE.md
  • conversation history
  • files Claude has read
  • search results
  • shell command outputs
  • compiler errors
  • test output
  • logs
  • diffs
  • tool definitions
  • MCP server metadata
  • plugin context
  • subagent summaries
  • planning notes
  • implementation attempts
  • user corrections

If you ask Claude Code to make a change without giving a clear scope, it may inspect many files, run broad searches, read irrelevant modules, execute tests with verbose output, and carry all of that context forward. The result can be high token usage before any useful code is written.

This does not mean Claude Code is inefficient. It means agentic coding needs a workflow. A human developer does not open every file in the repository before changing one function. A good developer narrows the problem. Claude Code needs the same guidance.

A poor prompt is:

Fix the authentication system.

A better prompt is:

The refresh token flow returns 401 after the access token's expiry. Start with src/auth/refresh.ts, src/auth/session.ts, and the tests under tests/auth. Do not refactor unrelated login code. First, explain the likely cause, then propose a minimal change and test plan.

The second prompt saves tokens by narrowing the search space. It also reduces the chance of expensive rework.

4. Track usage before optimizing blindly

The first step is measurement. Without measurement, token optimization becomes superstition.

Claude Code provides the /usage command. It shows token usage statistics for the current session. For API users, it can estimate costs based on local token counts, though actual billing should be verified in the Claude Console. For Pro, Max, Team, or Enterprise plans, /usage also shows plan usage information, activity statistics, and usage breakdowns. It can attribute recent usage to skills, subagents, plugins, and individual MCP servers. The numbers are approximate and local to the machine, but they are useful for understanding what is consuming context.

Developers should use /usage regularly, especially after:

  • opening a large repository;
  • adding MCP servers;
  • enabling plugins or skills;
  • running large test suites;
  • reading logs;
  • spawning subagents;
  • working in a long session;
  • using plan mode for a complex task;
  • attaching documents or screenshots.

Claude Code also supports /context, which helps identify what is consuming the context window. This is important because token waste is often hidden. The user may think the prompt is small, but the session may already contain a large CLAUDE.md, active MCP definitions, plugin context, previous command outputs, and a long conversation history.

For teams, measurement should be part of the rollout. Anthropic’s official cost guidance recommends establishing a baseline with a small pilot group before wider adoption. Per-developer cost varies by model choice, codebase size, usage patterns, the number of instances, automation, and agent teams. A small pilot reveals whether the team’s usage pattern is lightweight, moderate, or heavy.

For organizations using API billing, workspace spend limits and usage reporting should be configured in the Claude Console. For subscription users, the relevant experience is plan usage and usage credits. On some plans, /usage-credits can be used to set monthly spend limits for usage credits. For enterprise environments using Bedrock, Vertex, or other gateways, organizations may need external tracking because usage metrics may not be returned from the cloud provider.

The key principle is: do not optimize in the abstract. Measure first, then reduce the biggest sources of waste.

5. Model choice: use the right model for the job

Model selection is one of the most important cost decisions.

Opus model is the premium reasoning model and should be treated as a scarce resource. It is excellent for difficult architectural reasoning, complex planning, ambiguous debugging, deep design trade-offs, and high-stakes decisions. But not every action in a coding session needs Opus-level reasoning.

Sonnet handles most coding tasks well and is more cost-effective. For many implementation, refactoring, test-writing, documentation, and routine analysis tasks, Sonnet is the right default. Haiku can be useful for simple subagent tasks where speed and low cost matter more than deep reasoning.

A practical Claude Code model strategy is:

  • Use Sonnet as the default model for normal development.
  • Reserve Opus for complex reasoning, architecture, design, and planning.
  • Use Haiku for simple, isolated subagent tasks where appropriate.
  • Switch models intentionally with /model.

The most important workflow pattern is using Opus only where it creates the most value: planning.

In Claude Code, you can use:

/model opusplan

This sets the model behavior so that Opus is used in plan mode and Sonnet otherwise, and it can be saved as your default for new sessions. This is a powerful token-efficiency pattern because it lets you use Opus 4.8 for the thinking-heavy part of the workflow while using Sonnet for the execution-heavy part.

The idea is simple:

  • Planning is where mistakes are expensive.
  • Implementation involves many tokens, file edits, test runs, and follow-ups.
  • Opus is valuable for deciding the right approach.
  • Sonnet is usually sufficient for carrying out the approach.

This is similar to using a senior architect for the design review and a strong engineering team for implementation. You do not need the most expensive reasoning model for every file edit, every diff response, or every routine test update.

6. Use plan mode before expensive implementation

Plan mode is one of the best ways to prevent token waste in Claude Code.

Complex coding tasks often become expensive because Claude starts implementing too early, discovers missing information, changes direction, makes errors, modifies the wrong abstraction, or refactors too broadly. Every mistaken step consumes tokens: file reads, code edits, command outputs, test failures, correction prompts, and follow-up diffs.

Plan mode reduces this by forcing an analysis-first workflow. Before editing code, Claude explores the relevant parts of the project and proposes an approach. The user can approve, reject, or adjust the plan. This is especially valuable for tasks involving architecture, migrations, security changes, data models, cross-cutting refactoring, performance issues, or unfamiliar codebases.

With /model opusplan, the workflow becomes even stronger:

  1. Enter plan mode.
  2. Let Opus reason about the problem and propose the plan.
  3. Review and correct the plan.
  4. Exit plan mode and let Sonnet implement.
  5. Test incrementally.
  6. Stop early if the implementation drifts.

This avoids paying the premium reasoning cost for every execution step while still benefiting from strong reasoning where it matters most.

A good planning prompt looks like this:

Use plan mode. I need to add password-reset support to the FastAPI backend.

Scope:
- auth routes only
- email token generation
- token expiry validation
- tests for success, expired token, invalid token
- no frontend changes yet

First, inspect the relevant files and propose a minimal implementation plan.
Do not edit files until I approve the plan.

This prompt saves tokens by narrowing the task, preventing premature edits, and giving Claude a clear boundary.

7. Manage context proactively: /clear, /compact, /resume, and /rename

Context management is the foundation of token efficiency.

Claude Code provides several commands that help manage session context:

  • /clear starts fresh.
  • /compact summarises the current conversation.
  • /resume returns to a previous session.
  • /rename gives a session a meaningful name before clearing or switching.

Use /clear when switching to an unrelated task. If you finished working on authentication and now want to redesign a reporting engine, do not drag the old authentication context into the new task. Stale context wastes tokens on every future message and may confuse the model.

Use /compact when you are continuing the same broader task but want to reduce accumulated noise. Compaction summarises the session so you can continue with a smaller context. It is useful after completing a phase: investigation, design, implementation, test repair, or documentation. The best time to compact is before the session becomes overloaded, not after Claude starts losing track.

A practical pattern is:

/compact Focus on code changes, test results, open decisions, and remaining TODOs.

Or:

/compact Focus on API usage, files modified, architectural decisions, and failing tests.

Custom compaction instructions matter. A generic compact may preserve too much irrelevant detail or lose important task state. If you tell Claude what to preserve, you get a more useful summary.

You can also place compact instructions in CLAUDE.md, for example:

# Compact instructions

When compacting, preserve:

- files changed
- tests added or modified
- current failing tests
- architectural decisions
- unresolved questions

Discard:

- command noise
- successful test logs
- repeated explanations
- obsolete implementation attempts

However, this creates a trade-off. Anything in CLAUDE.md is loaded into context. Keep it concise.

The decision between /clear and /compact is simple:

  • Use /clear when the next task is unrelated.
  • Use /compact when the next task continues the same line of work.
  • Use /rename before clearing if you may need to find the session later.
  • Use /resume when returning to prior work.

Do not use /compact as a substitute for discipline. If the session contains too much irrelevant material, sometimes the best optimization is to clear and start with a clean, explicit prompt.

8. Keep CLAUDE.md small, stable, and useful

CLAUDE.md is one of the most useful Claude Code features, but it is also one of the easiest ways to waste tokens.

Claude Code loads CLAUDE.md automatically as project memory. This is ideal for stable project instructions: architecture overview, coding conventions, test commands, repository structure, style rules, domain terminology, and non-negotiable constraints. It saves you from retyping the same context in every session.

But because CLAUDE.md is loaded into context, it becomes a token tax. A large file is paid for repeatedly. If it contains long task notes, obsolete decisions, huge implementation guides, or detailed documentation for workflows you rarely use, it bloats every session.

A good CLAUDE.md should be concise. Anthropic’s official guidance recommends keeping it under 200 lines and moving specialized instructions into skills. Some practitioner materials are more permissive and mention larger thresholds, but the stronger discipline is to keep the file small enough that every line earns its place.

A good CLAUDE.md contains:

  • project purpose
  • key directories
  • architecture summary
  • build and test commands
  • coding standards
  • security rules
  • important domain terms
  • compact instructions
  • “do not” rules that prevent expensive mistakes

A poor CLAUDE.md contains:

  • long historical notes
  • old task state
  • full API documentation
  • large examples
  • detailed migration guides
  • many alternative workflows
  • verbose onboarding content
  • content only needed once a month

For example:

# Project overview

This is a FastAPI + React application for Roman coin identification.
The backend is in `backend/`.
The frontend is in `frontend/`.
MongoDB is used for catalog and vector search.
OpenAI Vision is used for image-based coin identification.

# Commands

Backend tests:
`cd backend && pytest`

Backend lint:
`cd backend && ruff check . && mypy .`

Frontend checks:
`cd frontend && npm run lint && npm run typecheck`

# Rules

- Do not refactor unrelated modules.
- Prefer minimal, testable changes.
- Add or update tests for backend behaviour changes.
- Before large changes, use plan mode.
- Preserve public API compatibility unless explicitly asked.

# Compact instructions

Preserve files changed, tests run, failing tests, decisions, and remaining TODOs.
Discard successful command noise and obsolete attempts.

This kind of file helps Claude work efficiently without becoming a dumping ground.

For specialized workflows, use separate files or skills. For example, instead of placing a full database migration manual inside CLAUDE.md, create a skill or a separate document and reference it only when needed. This keeps the base context small.

9. Move specialised instructions into skills

Skills are useful because they can provide domain-specific or workflow-specific guidance on demand. Unlike CLAUDE.md, which is loaded at the start of the session, skills can be invoked when relevant.

This matters for token usage. If your project has detailed instructions for PR reviews, database migrations, release notes, AWS IAM policy generation, security threat modeling, performance testing, or documentation generation, those instructions do not need to be present in every coding session.

A good division is:

  • CLAUDE.md contains a stable, universal project context.
  • Skills contain specialized, situational instructions.
  • Separate documentation files contain long reference material.
  • Prompts pull in only what is needed for the current task.

For example, a “database-migration” skill might include the full migration checklist, naming conventions, rollback requirements, and test strategy. Claude only needs that when working on a migration. It should not be loaded when fixing a frontend button.

A “codebase-overview” skill can also reduce exploration costs. Instead of forcing Claude to rediscover the project structure by reading many files, the skill can provide a curated map of the architecture, key directories, conventions, and common workflows. This turns repeated expensive exploration into a smaller, reusable context asset.

The principle is: do not make every session pay for every possible workflow.

10. Reduce MCP server overhead

MCP servers can be extremely useful. They connect Claude Code to external tools, systems, data sources, and workflows. But every integration has a context cost.

Anthropic’s documentation notes that MCP tool definitions are deferred by default, so only tool names enter context until Claude uses a specific tool. This helps. But MCP servers can still add overhead, especially when many are configured and available. Developers often add MCP servers as they discover them: GitHub, Supabase, browser tools, Figma, cloud tools, databases, observability systems, and internal services. Over time, the environment becomes heavy.

The practical rule is not “avoid MCP.” The rule is “use MCP intentionally.”

Run /context to see what is consuming space. Run /mcp to inspect configured servers. Disable servers that are not actively needed for the current project or task.

Prefer CLI tools when available and appropriate. Tools such as gh, aws, gcloud, kubectl, sentry-cli, psql, or local scripts can be more context-efficient because they do not require loading large tool schemas into the model context. Claude can run CLI commands directly and return concise outputs.

A good workflow is:

  • Keep project-specific MCP configuration separate.
  • Enable only the servers needed for that project.
  • Disable experimental or rarely used servers.
  • Prefer CLI commands for simple retrieval.
  • Use MCP when it provides high-value structured access.
  • Inspect context regularly.

MCP is powerful, but “connect everything” is not a cost strategy.

11. Use code intelligence plugins for typed languages

When Claude Code explores an unfamiliar codebase, it may use text search and file reads to understand symbols, references, and dependencies. This can be expensive. In typed languages, code intelligence plugins can reduce this overhead by giving Claude more precise navigation.

A “go to definition” operation can replace a broad grep followed by reading several candidate files. Type information can help Claude understand interfaces, function signatures, errors, and dependencies without scanning as much text. Language servers can also report type errors after edits, allowing Claude to catch mistakes without repeatedly running full builds or reading long compiler output.

This is especially useful in TypeScript, Java, C#, Go, Rust, Python with type hints, and other typed or partially typed codebases. It turns code navigation from “search and inspect” into “ask the language server.”

The result is not only lower token usage. It also improves accuracy. Claude is less likely to modify the wrong symbol, miss an overload, or misunderstand a type relationship.

12. Delegate verbose work to subagents carefully

Subagents can be useful because they operate in their own context window. This means verbose operations can be isolated from the main conversation. For example, a subagent can inspect logs, run tests, search documentation, or explore part of the repository, then return only a concise summary to the main session.

This can save the main context from being polluted with thousands of lines of output.

Good subagent tasks include:

  • “Run the test suite and summarise only failing tests.”
  • “Inspect this module and return the public API surface.”
  • “Search for usages of this function and summarise the call sites.”
  • “Read the migration files and identify patterns.”
  • “Compare these logs and return the top three error signatures.”

Bad subagent tasks include:

  • tiny shell commands
  • simple git status checks
  • trivial one-file edits
  • anything where the subagent overhead is larger than the task
  • broad, vague exploration with no summary format

Subagents are not automatically cheaper. They are separate Claude instances with their own context. If used casually, they can increase usage. They are cost-effective when they prevent large, noisy outputs from entering the main conversation.

When spawning a subagent, keep the prompt focused. Do not pass the entire project history. Give it the specific task, the files or commands it needs, and the required summary format.

Example:

Use a subagent to run backend tests.

Scope:

- run `cd backend && pytest`
- do not attempt fixes
- return only:
  1. number of tests run
  2. failing test names
  3. top error message for each failure
  4. likely affected files

This keeps the main conversation clean.

13. Be careful with agent teams

Agent teams can multiply token usage because each teammate runs as a separate Claude Code instance with its own context window. Anthropic’s documentation notes that token usage scales with the number of active teammates and how long each one runs. Agent teams may use approximately 7 times as many tokens as standard sessions when teammates run in plan mode.

This does not mean agent teams are bad. They are useful for parallel work, larger tasks, and role-based collaboration. But they require cost discipline.

To manage agent team costs:

  • Keep teams small.
  • Use Sonnet for teammates where possible.
  • Keep spawn prompts focused.
  • Avoid giving every teammate the full project history.
  • Clean up teams when work is done.
  • Do not leave idle teammates running.
  • Use agent teams for tasks that genuinely benefit from parallelism.

A business analogy is hiring a team of consultants. If every consultant attends every meeting, reads every document, and writes a separate report, costs rise quickly. The same applies to agent teams.

Use them when parallel work saves meaningful time or improves quality, not for routine edits.

14. Offload preprocessing to hooks and scripts

One of the best ways to save tokens is to prevent noisy data from reaching Claude in the first place.

Logs, test outputs, build outputs, stack traces, JSON dumps, CSV files, and generated files can be huge. Claude does not need to read everything. It usually needs the failing lines, error messages, relevant stack frames, changed files, or summary statistics.

Anthropic’s guidance recommends using hooks to preprocess data before Claude sees it. For example, instead of allowing Claude to read a 10,000-line test output, a hook or shell script can filter only failures. Instead of pasting the full log file, run a command to extract error lines and their surrounding context.

Examples:

pytest 2>&1 | grep -A 5 -E "(FAIL|ERROR|AssertionError)" | head -100
grep -i "error" application.log | tail -50
jq '.errors[] | {code, message, path}' response.json
git diff --stat
git diff -- src/auth/refresh.ts tests/auth/test_refresh.py

The principle is simple: use deterministic tools to reduce raw data before involving the model.

Claude is excellent at reasoning over meaningful context. It should not be used as an expensive grep, tail, or jq replacement when simple tools can reduce the input first.

This applies to business users too. Before uploading a 100-page document, extract the relevant section. Before asking Claude to analyze a whole spreadsheet, provide the relevant rows, columns, or summary. Before uploading screenshots, describe the issue or crop the image to the relevant area.

15. Write specific prompts

Prompt specificity is one of the cheapest forms of token optimization.

A vague prompt causes Claude to infer scope. In Claude Code, this may trigger broad repository exploration. In business writing, it may trigger broad analysis and long outputs. In both cases, ambiguity becomes a matter of token usage.

Bad:

Improve this.

Better:

Rewrite only the executive summary.
Keep the meaning unchanged.
Make it clearer for a project and program managers.
Limit it to 250 words.
Do not change the recommendations section.

Bad:

Fix the tests.

Better:

Fix only the failing tests in `tests/auth/test_refresh.py`.
Do not change production code unless the test failure reveals a real bug.
Run only the auth test file first.
Return a short summary of the change.

Bad:

Review this codebase.

Better:

Review the authentication module for security issues.
Focus on token expiry, refresh-token storage, password reset, and error handling.
Do not review frontend styling or unrelated API routes.
Return findings ranked by severity.

The more precise the prompt, the less Claude needs to explore, guess, and generate.

A useful structure is:

  1. Aask
  2. Scope
  3. Files or sections
  4. Constraints
  5. Output format
  6. Verification target
  7. What not to do

Example:

Task: Add validation to the invoice creation endpoint.

Scope:
- backend only
- files: `src/invoices/routes.ts`, `src/invoices/schema.ts`, tests under `tests/invoices`
- validate customer ID, line-item quantity, unit price, and currency

Constraints:
- do not refactor the invoice service
- preserve the existing API response shape
- use the current validation library

Verification:
- add tests for invalid quantity, missing customer ID, and unsupported currency
- run only invoice tests first

Output:
- brief plan
- then implementation
- then test result summary

This kind of prompt often saves more tokens than any clever trick.

16. Control output length

Output tokens are expensive and can be wasteful. Claude often tries to be helpful by explaining what it did, restating the problem, providing alternatives, and adding next steps. Sometimes this is useful. Sometimes it is just extra text.

When you do not need a long answer, say so.

Examples:

Answer in no more than 10 bullet points.
Return only the changed code block.
Return only a unified diff.
Do not explain unless there is a risk or trade-off.
Summarise the result in 5 lines.
Only rewrite section 3. Keep all other sections unchanged.

For business writing, avoid regenerating entire documents unnecessarily. If section 3 is weak, ask Claude to rewrite section 3. If the introduction needs a stronger hook, ask for three alternative introductions. If the conclusion is too long, ask only for a shorter conclusion.

For developers, avoid asking Claude to print entire files after edits unless necessary. Diffs are usually better. Summaries are often enough. Fully regenerated files burn output tokens and make review harder.

17. Work incrementally and test early

Large tasks become expensive when errors are discovered late. The model may implement many changes, run tests, discover failures, inspect logs, revise the approach, and rewrite code. Each loop costs tokens.

A better pattern is incremental development:

  1. Plan.
  2. Make a small change.
  3. Run a focused test.
  4. Fix immediately.
  5. Expand scope.
  6. Run broader tests.
  7. Compact after a completed phase.

This reduces token waste because failures are caught while the relevant context is still small.

For example, instead of asking Claude Code to “implement the full reporting engine,” break it into phases:

  1. Define template schema.
  2. Implement parser.
  3. Render simple text blocks.
  4. Add tables.
  5. Add pagination.
  6. Add headers and footers.
  7. Add charts.
  8. Add tests.
  9. Add documentation.

Each phase should have acceptance criteria. After each phase, either compact or clear, depending on whether the next phase needs the same context.

This is especially important for LLM-assisted development because the cost of ambiguity compounds. A wrong architecture implemented across ten files is expensive to unwind. A wrong plan corrected before implementation is cheap.

18. Course-correct early

When Claude starts moving in the wrong direction, stop it early.

In Claude Code, pressing Escape can interrupt a response. /rewind or double-tap Escape can restore conversation and code to a previous checkpoint. This is not only a quality feature but also a cost-control feature. Letting Claude finish a long, wrong implementation wastes tokens and creates more context that later has to be corrected or ignored.

Users often wait too long because they think, “Let’s see where it goes.” That may be fine for brainstorming, but in coding, it can be expensive. If you see Claude reading irrelevant files, refactoring too broadly, changing public APIs without permission, or running the wrong command, stop it.

Then give a correction:

Stop. This is going too broad.

Only modify `src/auth/refresh.ts`.
Do not change login, registration, or middleware.
The issue is specifically refresh-token expiry handling.
Propose a narrower plan before editing.

Early correction is one of the highest-value behaviors in Claude Code.

19. Use documents and attachments carefully

Large documents are one of the most common token traps for business users.

Before uploading or pasting a document, ask:

  • Does Claude need the whole document?
  • Can I paste only the relevant section?
  • Can I provide a summary first?
  • Can I extract the table or paragraph that matters?
  • Can I ask Claude to analyze one chapter at a time?
  • Can I remove boilerplate, headers, footers, and appendices?
  • Can I convert a screenshot into text?

PDFs, slide decks, screenshots, and Word documents can contain hidden token overhead. Screenshots can be especially expensive compared with text, and they may include irrelevant visual information. If the task is textual, text is usually better than an image.

For repeated reference material, avoid uploading the same document into many separate chats. In Claude Projects or similar environments, persistent project knowledge may be more efficient when the same material is reused frequently. For Claude Code, stable project context belongs in CLAUDE.md, skills, or referenced files, but only when it is genuinely useful.

A good business workflow is:

  1. Extract the relevant section.
  2. Ask Claude to summarise or analyze it.
  3. Store the summary as a working context.
  4. Continue with the summary instead of the original large document.
  5. Only return to the full document when needed.

For developers, the same principle applies to logs, generated files, minified files, lock files, and large JSON payloads. Do not feed raw noise to the model.

20. Understand extended thinking

Extended thinking improves performance on complex reasoning tasks but consumes additional output tokens. Anthropic’s documentation explains that thinking tokens are billed as output tokens, and the default budget can be large depending on the model. For simpler tasks, reducing the effort level or disabling thinking, where available, can reduce costs.

The practical guidance is:

  • Use higher thinking effort for architecture, planning, debugging, and complex trade-offs.
  • Use lower effort for routine edits, simple rewrites, formatting, and small code changes.
  • Avoid deep reasoning settings when the task is mechanical.
  • Use /effort or model configuration where supported.
  • Understand that some models may use adaptive reasoning and ignore nonzero fixed budgets.
  • Know that not all models allow thinking to be disabled.

Business users should also understand this pattern. Asking for “deep analysis” of a large document invites longer reasoning and output. That is useful when making an important decision, but unnecessary for simple summarisation.

For example:

Summarise this in 5 bullets.

should not require the same reasoning effort as:

Assess this acquisition strategy, identify hidden risks, challenge the assumptions, and recommend whether the board should approve it.

Use reasoning depth intentionally.

21. Team-level governance for Claude Code usage

For organizations, token efficiency should not be left entirely to individual behavior. Teams need lightweight governance.

A good team rollout should include:

  • Baseline usage measurement with a pilot group.
  • Recommended default model configuration.
  • Guidance on when to use Opus.
  • Default /model opusplan recommendation for complex development.
  • Project-level CLAUDE.md standards.
  • MCP server review process.
  • Approved skills and plugins.
  • Examples of good prompts.
  • Guidance for /clear, /compact, and /usage.
  • Rules for handling sensitive data.
  • Rate limits and spend limits were applicable.

Anthropic’s official documentation provides rate-limit recommendations by team size, with token-per-minute and request-per-minute guidance decreasing per user as the organization size grows. The reason is that not all users are active simultaneously in large organizations. This means capacity planning should consider concurrency, not just headcount.

Organizations should also pay special attention to training sessions. A live workshop where many developers use Claude Code simultaneously can create unusually high concurrent usage. This may require higher temporary limits or careful scheduling.

For API-based usage, workspace limits and cost reporting are important. For subscription usage, usage bars and plan limits matter. For enterprise cloud environments, external tracking may be required if usage metrics are not automatically sent back.

The goal is not to restrict Claude Code to the point that developers stop using it. The goal is to prevent avoidable waste while preserving productivity.

22. Recommended personal workflow for developers

Here is a practical Claude Code workflow optimized for token efficiency.

At project setup

Create a lean CLAUDE.md.

Include:

  • Project overview
  • Key directories
  • Commands
  • Coding rules
  • Testing rules
  • Compact instructions

Do not include:

  • Long documentation
  • Temporary task notes
  • Old decisions
  • Detailed manuals
  • Large examples

Configure model defaults. A strong default is:

/model opusplan

This allows Opus to be used for plan mode and Sonnet otherwise, and it can be saved as your default for new sessions.

Review MCP servers. Keep only what you need for the project.

Install relevant code intelligence plugins for typed languages.

At the start of a task

Use a scoped prompt.

Include:

  • Task
  • Relevant files
  • Boundaries
  • Output format
  • Test target
  • Whether to use plan mode

For complex tasks, enter plan mode first.

During implementation

Do not let Claude explore broadly without reason.

Stop early if it goes off track.

Run focused tests before broad tests.

Ask for diffs or summaries instead of full files.

Filter logs and test output.

Use subagents only for verbose isolated work.

Between phases

Use /compact with specific instructions.

Example:

/compact Preserve files changed, tests added, current failures, decisions, and TODOs.

Use /clear when switching to an unrelated task.

Use /rename before clearing important sessions.

Use /usage and /context regularly.

At the end

Ask Claude to produce a concise handoff summary:

Summarise:
- files changed
- behavior changed
- tests added
- commands run
- remaining risks
- suggested next task

Save this summary in a project progress file only if it is genuinely useful. Do not paste every session summary into CLAUDE.md.

23. Recommended workflow for business users

Business users can apply a similar discipline.

Start with the outcome

Instead of:

Help me with this document.

Use:

Review this proposal for executive clarity.
Focus only on:
- decision logic
- financial assumptions
- risks
- missing next steps

Return:
- top 5 issues
- suggested rewrite of the executive summary
- questions I should answer before sending

Work in sections

Do not ask Claude to regenerate a whole paper or proposal after every small change. Work section by section.

Rewrite only the “Commercial Rationale” section.
Keep the argument unchanged.
Make it more concise and board-level.
Limit to 300 words.

Reduce documents before uploading

If the document is large, provide the relevant extract first. If Claude needs more, it can ask for the specific missing section.

Avoid endless chat drift

When the topic changes, start a new chat. Long conversations are useful for continuity, but expensive when the old context is no longer relevant.

Ask for concise outputs

Give me only the final version, no explanation.

or:

Give me three options, each under 100 words.

Preserve reusable context intentionally

If you repeatedly work on the same business area, maintain a short reusable brief:

  • Company context
  • Audience
  • Tone
  • Product description
  • Key constraints
  • Preferred writing style

Keep it short. Do not paste a full company handbook into every prompt.

24. Common mistakes and better alternatives

Mistake 1: Keeping one endless conversation

Long conversations feel convenient, but they become token furnaces. Every turn may carry old context.

Better: clear or start fresh when the task changes. Compact when continuing the same task.

Mistake 2: Bloated CLAUDE.md

A huge CLAUDE.md feels helpful, but taxes every session.

Better: keep only stable essentials in CLAUDE.md and move specialized instructions into skills or separate files.

Mistake 3: Using Opus for everything

Opus is powerful, but using it for every routine edit is inefficient.

Better: use /model opusplan and reserve Opus for planning and complex reasoning, use Sonnet for execution.

Mistake 4: Asking Claude Code to “look around”

Broad exploration consumes tokens quickly.

Better: point Claude to likely files and define the scope.

Mistake 5: Pasting full logs

Raw logs are noisy.

Better: filter logs with grep, tail, jq, or scripts before giving them to Claude.

Mistake 6: Regenerating whole documents

Full rewrites burn output tokens and make review difficult.

Better: revise specific sections.

Mistake 7: Too many MCP servers

Every integration can add overhead.

Better: enable only relevant MCP servers and inspect context with /context.

Mistake 8: Using subagents for tiny tasks

Subagents have overhead.

Better: use them for noisy, isolated work, not trivial commands.

Mistake 9: Waiting too long to correct Claude

A wrong implementation path grows expensive.

Better: interrupt early and redirect.

Mistake 10: Not measuring usage

Without /usage, optimization is guesswork.

Better: check usage and context regularly.

25. The operating model: spend tokens where they create value

The best token strategy is not “use as few tokens as possible.” That would be the wrong goal. The goal is to spend tokens where they create value.

Good token spending:

  • Opus is thinking through a hard architecture decision.
  • Claude is reading the right files to fix a serious bug.
  • Generating tests that prevent regressions.
  • Reviewing a high-stakes proposal.
  • Summarising a complex but relevant document.
  • Comparing design alternatives.
  • Producing a useful implementation plan.

Bad token spending:

  • Re-reading stale conversation history.
  • Carrying obsolete logs.
  • Loading unused MCP servers.
  • Keeping bloated CLAUDE.md content.
  • Rewriting whole documents unnecessarily.
  • Exploring unrelated repository areas.
  • Printing full files when a diff is enough.
  • Using Opus for routine edits.
  • Letting agent teams run without a clear scope.
  • Asking vague questions that force broad inference.

This distinction matters because token optimization should not reduce quality. In fact, the best practices usually improve quality. Clear scope, smaller context, better model selection, early planning, focused tests, and concise outputs make Claude more effective.

26. Practical checklist

Use this checklist before and during Claude Code work.

Before starting

  • Is this task related to the current session?
  • Should I /clear first?
  • Is the task complex enough for plan mode?
  • Is /model opusplan enabled?
  • Is CLAUDE.md lean and relevant?
  • Are unnecessary MCP servers disabled?
  • Do I know the relevant files?
  • Can I provide acceptance criteria?

During work

  • Is Claude reading relevant files only?
  • Is output too verbose?
  • Should I ask for a diff instead of a full file?
  • Are logs filtered?
  • Are tests focused?
  • Should a verbose task be delegated to a subagent?
  • Should I stop Claude because it is drifting?

After a phase

  • Should I /compact?
  • What should compaction preserve?
  • Should I save a short progress note?
  • Should I /clear before the next task?
  • What did /usage show?
  • What did /context show?

For business writing

  • Am I asking for the whole document or only one section?
  • Did I provide the target audience?
  • Did I define the output length?
  • Did I specify what should not change?
  • Can I extract relevant text instead of uploading a large file?
  • Is this conversation still focused?

Conclusion

Claude and Claude Code are most effective when treated not as magical chat boxes, but as context-driven reasoning systems. Tokens are the fuel for that reasoning. If the context is clean, scoped, and relevant, tokens are spent on useful work. If the context is bloated, stale, and vague, tokens are wasted before the model even begins solving the problem.

For business users, the discipline is to provide focused context, work in sections, avoid unnecessary attachments, control output length, and start fresh when the topic changes.

For developers, the discipline is to keep CLAUDE.md lean, use /model opusplan, reserve Opus for planning and hard reasoning, use Sonnet for most implementation, inspect usage with /usage, inspect context with /context, compact proactively, clear between unrelated tasks, filter noisy outputs, manage MCP servers, and use subagents only when their isolation saves the main context from noise.

The practical philosophy is simple:

  • Use the strongest model for the thinking that matters.
  • Use the cheaper model for routine execution.
  • Keep context small.
  • Be specific.
  • Measure usage.
  • Stop wrong work early.
  • Do not make every future prompt pay for every past detail.

Claude Code rewards users who work like good engineers and good managers: clear scope, clean context, deliberate tools, early feedback, and disciplined execution. That is how you save tokens without sacrificing quality.