Prompt engineering gets the attention. Context engineering does the real work. The distinction matters because it changes what you optimise for. Prompt engineering asks: how do I phrase this request? Context engineering asks: what information does this model need, in what structure, to produce the output I need reliably?
These are different questions. The second one is harder, more consequential, and almost entirely responsible for whether a production LLM system behaves the way you want.
What Context Engineering Actually Is
Every LLM call has a context window. What goes into that context window is the context. In simple cases, that is just the user's message. In production systems, the context is assembled from multiple sources: system prompts, retrieved documents, conversation history, tool outputs, structured data, and user inputs.
Context engineering is the discipline of deciding what goes into that context, in what order, in what format, and at what level of detail. It is information architecture applied to the constraints of a finite context window.
The context window is not unlimited. Every token you use on something irrelevant is a token not available for something useful. Context engineering is fundamentally about prioritisation under constraints.
The Context Budget
Start with a budget. For a given model with a given context limit, decide in advance how many tokens to allocate to each component:
- System prompt and role definition: fixed overhead
- Retrieved documents: variable, largest allocation
- Conversation history: variable, pruned by relevance and recency
- Current user input: variable, usually small
- Output buffer: reserved, never filled with input
The errors most systems make: no retrieval strategy (everything or nothing), conversation history that grows unbounded until it hits the limit, and system prompts that expand over time without a corresponding reduction elsewhere.
Retrieval Strategy
For systems that retrieve information before calling the model, the retrieval quality determines the context quality. A retrieval system that returns the ten most semantically similar chunks regardless of relevance, recency, or diversity will fill the context with redundant and tangentially relevant information.
Better retrieval principles for context engineering:
Score on relevance AND recency. A document from three years ago that is semantically similar to the query may be less useful than a more recent document that is somewhat less similar.
Deduplicate before injecting. Multiple chunks from the same source saying the same thing in slightly different ways waste tokens without adding information.
Position matters. Models attend more to the beginning and end of long contexts. The most important retrieved information should appear near the top.
Conversation History Management
Conversation history in multi-turn systems is the most common cause of context degradation. The naive approach is to append every exchange and eventually hit the context limit. The correct approach is to manage history as a rolling, relevance-ranked window.
Three patterns that work:
Summarise older turns. When history gets long, replace the oldest N turns with a brief summary. The model loses verbatim recall of those turns but retains the substance.
Keep only task-relevant turns. If the conversation drifted into small talk three turns ago, those turns are not relevant to the current task. Remove them.
Use explicit memory for facts. Rather than relying on the context to carry important facts mentioned earlier, extract them to a structured memory and inject them as a concise fact sheet at the top of each turn.
Structural Information Architecture
How you structure information in the context affects model behaviour. Three principles:
Instructions before documents. The model should understand what to do before it processes what to work with.
Use XML or Markdown delimiters. Clearly delineated sections reduce ambiguity about where one type of information ends and another begins. Tags like <documents>, <conversation_history>, and <instructions> are more reliable than prose transitions.
Put the most important constraint last. Models exhibit recency bias in long prompts. The most critical instruction belongs near the end of the system prompt or immediately before the user input.
Context Engineering Is a System Property
Individual prompts can be improved through prompt engineering. But the context quality of a production system depends on decisions made across the entire pipeline: how documents are chunked and indexed, how history is stored and pruned, how tool outputs are formatted, how retrieved results are ranked and filtered.
These decisions compound. A system with excellent retrieval and poor history management will still exhibit context degradation at scale. Context engineering is a system-level discipline. Treating it as a per-prompt concern is why most production LLM systems plateau in quality.
