The Lookback Mechanism
In-context recall asks the model to do three things in sequence: mark a source so it can be found again, carry something that points back to it, and fetch the source's payload at the moment it is needed. The three topics that usually get studied separately — entity binding, retrieval heads, and the lookback motif — are answers to those three jobs. Binding is how the source is marked. A retrieval head is the hardware that fetches. Lookback is the protocol that links the mark to the fetch by treating the binding as a pointer.
The example below is the smallest case that needs all three. Two characters each act on an object; a later query names one character and must report that character's object, not the nearer one. Surface proximity favors the wrong answer, so getting it right requires a stored reference rather than a local guess.
1. Binding stores the address
Before anything can be retrieved, the source has to be marked. The model binds an entity to its attributes by co-locating their reference information in the residual stream of a single token, encoded as an abstract identifier — a binding ID, or in Prakash and coauthors' terms an ordering ID — that lives in a low-rank subspace. The identifier is variable-like: it does not name a position, so it survives intervening tokens. The causal test is the swap. Exchange two entities' binding IDs and the model reports the swapped attribute, which is the signature of a genuine binding variable rather than a positional cue. This is the storage step, covered in more detail on the binding page.
2. Retrieval heads do the fetch
The fetch is performed by a small, distinctive set of attention heads. Across model families and scales, fewer than about five percent of heads carry most long-context recall; they are present in models pretrained on short contexts and remain the same heads when the context window is extended. Functionally a retrieval head performs a conditional copy-paste: when its query matches a source, the OV path copies the source token forward. Masking these heads degrades the output gracefully — from complete to partial recall to a fluent but unsupported answer — which ties them directly to factuality rather than to fluency. A retrieval head is the conditional-copy generalization of the induction head, and appears in the catalog of attention-head labels as one of the better causally-tested kinds.
3. Lookback joins the two
Lookback is the protocol that turns a stored binding into a completed retrieval. When the source is recalled, its reference is copied into an address in the residual stream of the recalled token. At the later position that needs the information, a matching pointer is formed. A retrieval head then dereferences the pointer: it attends from the pointer back to the token whose address matches, and copies the bound payload forward. The figure walks the four states — bind, pointer, dereference, emit.
The motif recurs beyond the single example. Prakash and coauthors find it implementing belief tracking, where a query about one character's view of an object dereferences the binding set up earlier in the passage, and a visibility relation is itself stored as an ID and looked back to. Later work studies the pointers themselves — how the dereferenced payload is brought into the residual stream of the final token — and how lookback is rebound when an entity's state changes mid-context.
4. Why read it as one mechanism
Treating binding, retrieval heads, and lookback as one mechanism is what lets a result transfer. A swap intervention shows the address is real; a head-masking experiment shows the fetch is real; a pointer-patching experiment shows the dereference is real. Each is evidence about a different stage of the same pointer-and-payload algorithm, and a claim about in-context recall is strongest when it names which stage it tests. The mechanism still varies by model and task, so the motif is a template for analysis rather than a fixed circuit.
- Feng and Steinhardt (2024), "How do Language Models Bind Entities in Context?", for binding-ID interventions.
- Wu, Wang, Xiao, Peng, and Fu (2025), "Retrieval Head Mechanistically Explains Long-Context Factuality", for sparse, universal, intrinsic retrieval heads and masking experiments.
- Prakash, Shaham, Rager, Geva, Bau, and coauthors (2025), "Language Models Use Lookbacks to Track Beliefs", for the lookback motif and ordering IDs.
- Feng, Geiger, and coauthors (2025), "Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context", for the pointers used in lookback.
- Binding for the dependency the mechanism resolves and the swap intervention.
- Attention Head Labels for retrieval heads in context with other labels.
- Induction Heads for the copy circuit that retrieval generalizes.