Have you ever been stopped cold by this error in Claude Code or the API?

Prompt is too long

# On the API, more specifically:
prompt is too long: 233153 tokens > 200000 maximum

"The prompt is too long" — meaning the input you are trying to send (conversation history + attached files + tool definitions, etc.) exceeds the model's context window (the input limit). On the API it even tells you "how many tokens, against what maximum" as in 233153 tokens > 200000 maximum. This is different from a usage limit — you have not run out of quota; a single input is simply physically too big.

Three takeaways up front. (1) The cause is "the input does not fit the window." It is not the max_tokens output cutoff, nor the usage limit quota. (2) Claude Code normally avoids it automatically via auto-compact (auto-summarization), so when you see it, you either "blew past the window at once" or have turned auto-compact off. (3) The fastest fixes are /compact to summarize history, /clear to start fresh, and offloading huge reads to a subagent. This article covers what fills the window, the window sizes (200K and 1M), how to fix it, and how to tell it apart from confusable errors — based on official information.

CLAUDE CODE · CONTEXT WINDOW

Until the "window" is full

— what fills the context, and when it overflows

system prompt
CLAUDE.md
MCP tool defs
files you read
tool results
conversation (grows)

stack up to 100%, and then...

100%
= 200K / 1M tokens
Prompt is
too long

Normally auto-compact
summarizes before overflow

This means the "input window" is fullnot a usage limit (quota), not an output cutoff (max_tokens).
Stack ratios are illustrative. Check the real breakdown with /context.

1. What this error is telling you

AI models have an input limit called the "context window." It is "the maximum amount of information that can be read in a single exchange," counted in tokens (roughly fragments of words). Prompt is too long means the total tokens of the input you tried to send exceed that window. On the API it even prints the numbers: 233153 tokens > 200000 maximum (you sent 233,153 tokens; the limit is 200,000).

The key point is that this is about the input side. The context window sums up conversation history, attached/read files, tool execution results, the system prompt, and MCP tool definitions. Keep a long conversation going, read a giant file whole, or pile up lots of tool output, and the window fills gradually and overflows at some point. For the concept itself, see What is a context window.

Note that Claude Code usually has auto-compact (auto-summarization) on by default, which automatically summarizes history to free space as the window nears full. So normally you never see this error. If it still appears, it is usually because (1) a single input blew past the window at once (e.g., pasting a giant file), or (2) you disabled auto-compact (DISABLE_AUTO_COMPACT).

2. What fills the context window

"It overflows faster than expected" because invisible elements consume the window too. Here is the main breakdown from Claude Code's official docs.

What fills the windowContentsHow to lighten it
Conversation historyEvery user/assistant turn. The biggest factor — it keeps growing until cleared/compact to summarize, /clear to restart
Files you readEvery file you Read goes into the window. Reading a giant file whole is heavyRead by line ranges; offload big reads to a subagent
Tool resultsCommand output, search results, etc. accumulate tooAvoid huge unnecessary output; compact often
MCP tool definitionsTool defs of connected MCP servers. The more servers, the more they eat from the startDisable unused MCP with /mcp
CLAUDE.md / memoryProject/global instructions, auto-memory. Always loadedAvoid bloat; check with /doctor
System promptCore behavior instructions. Always present, fixed, untouchable(Cannot trim. Reduce the rest)

The point: "conversation history, file reads, tool results" are dynamic factors that grow, while "MCP defs, CLAUDE.md, system prompt" are fixed factors present from the start. The trick is that a subagent has its OWN window — offload a giant file read or investigation to a subagent and its result (the heavy raw data) never enters your main window. See exactly what is eating the window with /context. For the discipline of designing context deliberately, see context engineering.

3. Window sizes — 200K and 1M

"What the maximum is" depends on the model. Here is the 2026 big picture (specific values can be revised, so confirm the latest official list).

200K vs 1M

The window can differ by 5x

Standard 200K tokens
Sonnet 4.5, Haiku 4.5, Opus 4.5, etc. The "200000 maximum" you see in the error is this. Plenty for most day-to-day work, but overflows easily on huge codebases or long sessions.
1M tokens
Opus 4.8/4.7/4.6, Sonnet 4.6, etc. 5x the standard. As of 2026 it is available at standard pricing (no long-context surcharge currently). In Claude Code it appears with a [1m] suffix.
Caution: 1M is not a cure-all
(1) On subscriptions, a [1m] model may require usage credits. (2) Newer models use a changed tokenizer that consumes roughly 30-35% more tokens for the same text (so even 1M holds less than it feels). Before widening the window, the basic move is to not clutter it.

Window sizes, 1M support, and pricing get revised over time. Do not memorize fixed values — confirm in the latest official model list.

It is tempting to think "switching to a 1M model solves everything," but a bigger window is an escape, not always a solution. Widen the window while keeping a cluttered conversation, wasteful whole-file reads, and unused MCP, and you only raise cost and slow responses. The skillful approach is to first tidy the window (compact, clear, subagents), and use 1M only for the genuinely large tasks that still need it.

4. How to fix it now

Moves for the moment the error appears, in priority order. Pick by situation (history ballooned / you fed in a giant file).

FIXES

How to free the window

1) /compact (first)
Summarize history to free space. You can focus it: /compact focus on the auth bug. Keeps context while slimming down.
2) /clear (new task)
Wipe the conversation. CLAUDE.md and project info remain. Fastest when moving to unrelated work.
3) Offload big reads
Read giant files by line range, or have a subagent investigate and return only the conclusion (it uses its own window).
4) Trim the fixed load
Use /context to see the breakdown, then disable unused MCP and slim CLAUDE.md. /doctor flags bloat.
5) A 1M model if it really is huge
Only when you genuinely need it (e.g., handling a whole large codebase), switch with /model to a 1M-context model. But do the tidying (1-4) first. Do not disable auto-compact (keep it on by default).

Default to 1) /compact then 2) /clear. If the overflow is mainly a "big read," use 3). If it is chronic, trim the fixed load with 4).

Note: /compact itself can fail with "Conversation too long. Press esc twice..." — that means the window is already so full there is no room even to insert a summary. In that case, press Esc twice to go up a few messages, or /clear to restart. For systematic token saving, see Claude Code token saving.

5. Telling apart three confusable errors

The "too long / stalled" family has several members, and the fixes can be opposite. Distinguish these three (+ one) so you do not confuse them.

SymptomWhat it really isMain fix
Prompt is too long / N tokens > M maximumThis article's topic. The input exceeded the context window/compact, /clear, offload big reads to a subagent, 1M model
Response cut off (stop_reason: max_tokens)The output was truncated at the max_tokens you set in the request (not a window problem)Raise max_tokens / ask it to continue
usage limit reachedYour plan's usage quota is spent (unrelated to the token window)Wait for reset; usage limit fixes
Usage credits required for 1M contextAn entitlement matter. You picked a [1m] model not included in your plan (not overflow, not quota)Enable credits, or /model to a standard window

The axis: if you see numbers like "N tokens > M maximum," it is input overflow = this article. A cleanly truncated response is the output cap (max_tokens). "reset at [time]" is a usage limit. "credits required for 1M" is an entitlement (plan) matter. For other common Claude Code errors, see the error roundup.

6. Prevention checklist

Habits to keep the window from overflowing.

(1) Keep auto-compact on by default (do not turn it off with DISABLE_AUTO_COMPACT). (2) /clear at task boundaries; /compact often mid-conversation. (3) Read giant files by line range or via a subagent; do not paste them whole. (4) Disable unused MCP and do not let CLAUDE.md bloat (check with /doctor). (5) Check the breakdown with /context before heavy work. (6) Use a 1M model only for genuinely large tasks; run on the standard window + tidying the rest of the time.

Summary

Claude Code / API's "Prompt is too long" means the input (conversation history + files + tool definitions, etc.) exceeded the model's context window. On the API it even shows the cap as N tokens > M maximum. It is neither a usage limit (quota) nor an output cutoff (max_tokens) — it is "the input is physically too big." Claude Code usually avoids it via auto-compact, so when it appears you either blew past the window at once or turned auto-compact off.

The window is filled by conversation history, file reads, tool results (dynamic) + MCP defs, CLAUDE.md, system prompt (fixed). The fastest fixes are (1) /compact -> (2) /clear -> (3) offload big reads to a subagent -> (4) trim the fixed load with /context -> (5) a 1M model only if truly needed. Window sizes are standard 200K and 1M; 1M is at standard pricing as of 2026, but note that subscriptions may require credits and the new tokenizer consumes more. The basic rule: before widening the window, stop cluttering it. Related: What is a context window, context engineering, usage limit fixes.

FAQ

Q. Are "Prompt is too long" and "usage limit reached" the same thing?
A. Completely different. "Prompt is too long" means a single input exceeded the context window (the token limit). "usage limit reached" means you spent your plan's usage quota — unrelated to the token window. The former is fixed instantly by freeing the window with /compact or /clear; the latter needs waiting for a reset or a plan action.

Q. It never appears normally, then suddenly showed up. Why?
A. Claude Code has auto-compact on by default, which auto-summarizes history to avoid it as the window nears full. If it still appears, it is usually because (1) you fed in a giant file or huge amount of data at once and blew past the window, or (2) you turned auto-compact off with DISABLE_AUTO_COMPACT. Fix the former with splitting / line-range reads / a subagent, the latter by re-enabling auto-compact.

Q. I ran /compact and got "Conversation too long" — it cannot even summarize.
A. The window is already so full there is no room even to insert a summary. Press Esc twice to go up a few messages and retry, or /clear to restart the conversation. From then on, /compact before it fills and offload big reads to a subagent to prevent recurrence.

Q. Will switching to a 1M-context model solve it?
A. It helps for large tasks, but it is not a cure-all. Widening the window while keeping a cluttered history, wasteful whole-file reads, and unused MCP only raises cost and slows responses. Also, newer models use a changed tokenizer that uses roughly 30-35% more tokens for the same text, so it holds less than it feels. The smart play is tidy first (compact/clear/subagent), then use 1M only when truly needed. Note that subscriptions may require credits for [1m].

Q. I want to know what is eating the window.
A. Claude Code's /context shows the breakdown — system prompt, CLAUDE.md, MCP tool definitions, conversation history, files you read, etc. In most cases the ever-growing conversation history and large file reads are the main culprits. /doctor also flags a bloated CLAUDE.md or subagent definitions. If the fixed load (MCP defs, CLAUDE.md) is large, trimming that is effective.