The Lookback Mechanism

Store an address, carry a pointer, look back to dereference it.

In-context recall asks the model to do three things in sequence: mark a source so it can be found again, carry something that points back to it, and fetch the source's payload at the moment it is needed. The three topics that usually get studied separately (entity binding, retrieval heads, and the lookback motif) are answers to those three jobs. Binding is how the source is marked. A retrieval head is the hardware that fetches. Lookback is the protocol that links the mark to the fetch by treating the binding as a pointer.

The example below is the smallest case that needs all three. Two characters each act on an object; a later query names one character and must report that character's object, not the nearer one. Surface proximity favors the wrong answer, so getting it right requires a stored reference rather than a local guess.

Figure 1 · Store an address, carry a pointer, dereference

stepbind

1. Binding stores the address

Before anything can be retrieved, the source has to be marked. The model binds an entity to its attributes by co-locating their reference information in the residual stream of a single token, encoded as an abstract identifier (a binding ID, or in Prakash and coauthors' terms an ordering ID) that lives in a low-rank subspace. The identifier is variable-like: it does not name a position, so it survives intervening tokens. The causal test is the swap. Exchange two entities' binding IDs and the model reports the swapped attribute, which is the signature of a genuine binding variable rather than a positional cue. This is the storage step, covered in more detail on the binding page.

2. Retrieval heads do the fetch

The fetch is performed by a small, distinctive set of attention heads. Across model families and scales, fewer than about five percent of heads carry most long-context recall; they are present in models pretrained on short contexts and remain the same heads when the context window is extended. Functionally a retrieval head performs a conditional copy-paste: when its query matches a source, it copies that source token forward along the head's value pathway (the OV path). Masking these heads degrades the output gracefully, from complete to partial recall to a fluent but unsupported answer, which ties them directly to factuality rather than to fluency. A retrieval head is the conditional-copy generalization of the induction head, and appears in the catalog of attention-head labels as one of the better causally-tested kinds.

Figure 2 · Mask the retrieval heads and recall degrades by stages

retrieval heads masked0%

The degradation is graded, not a cliff. A few heads down and recall is still mostly intact; mask most of them and the model keeps producing fluent text while the answer detaches from its source, falling back to the nearer cup. The loss shows up in what the answer points to, not in whether the model produces one.

3. Lookback joins the two

Lookback is the protocol that turns a stored binding into a completed retrieval. When the source is recalled, its reference is copied into an address in the residual stream of the recalled token. At the later position that needs the information, a matching pointer is formed. A retrieval head then dereferences the pointer: it attends from the pointer back to the token whose address matches, and copies the bound payload forward. The figure walks the four states: bind, pointer, dereference, emit.

The motif recurs beyond the single example. Prakash and coauthors find it implementing belief tracking, where a query about one character's view of an object dereferences the binding set up earlier in the passage, and a visibility relation is itself stored as an ID and looked back to. Later work studies the pointers themselves (how the dereferenced payload is brought into the residual stream of the final token) and how lookback is rebound when an entity's state changes mid-context.

Three questions probe one algorithm. Binding is the representation question: which direction carries the identifier. The retrieval head is the routing question: which head reads it. Lookback is the protocol question: how an address written at one position is dereferenced at another. A model can hold the binding correctly and still answer wrong if the pointer dereferences to a nearer token, which is the same representation-versus-routing split that makes a distractor win on the binding page.

4. One mechanism lets a result transfer

Treating binding, retrieval heads, and lookback as one mechanism is what lets a result transfer. A swap intervention shows the address is real; a head-masking experiment shows the fetch is real; a pointer-patching experiment shows the dereference is real. Each is evidence about a different stage of the same pointer-and-payload algorithm, and a claim about in-context recall is strongest when it names which stage it tests. The mechanism still varies by model and task, so the motif is a template for analysis rather than a fixed circuit.

Citations

Feng and Steinhardt (2024), "How do Language Models Bind Entities in Context?", for binding-ID interventions.
Wu, Wang, Xiao, Peng, and Fu (2025), "Retrieval Head Mechanistically Explains Long-Context Factuality", for sparse, universal, intrinsic retrieval heads and masking experiments.
Prakash, Shaham, Rager, Geva, Bau, and coauthors (2025), "Language Models Use Lookbacks to Track Beliefs", for the lookback motif and ordering IDs.
Feng, Geiger, and coauthors (2025), "Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context", for the pointers used in lookback.

Related pages

Binding for the dependency the mechanism resolves and the swap intervention.
Attention Head Labels for retrieval heads in context with other labels.
Induction Heads for the copy circuit that retrieval generalizes.

What next

Back

Binding

The dependency the mechanism resolves.

Attention Head Labels

Retrieval heads among other labels.

Method

Causal Interventions

Patching tests that probe each stage.