Attention Head Labels

Labels such as positional, induction, and name-mover heads are hypotheses about circuits, not stable kinds.

Attention-head labels name patterns that recur across analyses: positional heads, induction heads, syntactic heads, rare-word heads, copy-suppression heads, retrieval heads, and name-mover heads. The labels are hypotheses about computations, not stable ontological kinds. A head is a parameterized computation in a layer, not a species.

The same surface pattern can support different functions, and the same function can be distributed across several heads. A cluster of head embeddings or attention patterns is evidence about similarity. It is not by itself evidence about causal role.

Figure 1 · Clusters before and after a null

1. What the labels describe

Labels help orient analysis. In Figure 1, a cluster labeled "positional" means the heads share offset or boundary-token behavior; a cluster labeled "induction" means the heads participate in a prefix-match-and-copy pattern. Those words make the analysis discussable. They also provide intervention hypotheses: if this is a copy head, ablating or patching it changes copying behavior under the relevant metric.

Positional
Attends by relative offset, often to the previous token, or to boundary tokens such as the start-of-sequence position. The heatmap is a band parallel to the diagonal. Common and easy to find, and rarely an interesting computation on its own.
Induction
On a repeated sequence, attends from a token back to the position just after that token's previous occurrence, so the OV path can copy what followed. The signature of in-context pattern continuation; see the Induction Heads page.
Retrieval
On a query whose answer is copied verbatim from the context, attends from the generation position to the matching source token so the OV path can copy it — the conditional-copy generalization of an induction head. A small, stable set of such heads (under about five percent) accounts for much long-context recall, and masking them makes the model hallucinate a fluent but unsupported answer rather than lose fluency, which ties the label to factuality and makes it one of the better causally-tested kinds. The lookback mechanism uses these heads to dereference a stored binding.
Syntactic
Attends along a dependency-like relation, for example a determiner to its noun or an object to its verb. Tested by checking whether the head's strongest attention matches gold dependency edges above a positional baseline, which most heads do only partially.
Name mover
In the indirect-object-identification circuit, moves a candidate name's information toward the final position so it can be predicted. Defined by its role in that circuit, not by a generic heatmap shape.
Copy suppression
Attends to an earlier instance of the current token and writes against it, lowering the probability of repeating a token already in context. A clear OV-defined function rather than a heatmap shape.
Rare-word
Attends to low-frequency tokens, often to carry an unusual word forward. A frequency-conditioned pattern that overlaps with other roles.
Figure 2 · Idealized attention patterns (schematic, not measured)

Each panel is a destination-by-source attention grid for one short sentence, darker where a destination token attends more to a source token. The shapes are caricatures of the labels, drawn to be told apart, not read off a real model.

2. What the labels hide

Head identity is layer-sensitive. Two heads with similar attention heatmaps can write very different OV outputs. Two heads with different heatmaps can contribute to the same behavior. A cluster may reflect layer, token-position statistics, or training artifacts more than a human-interpretable function.

A within-layer null asks whether the apparent separation remains after controlling for layer position. Other nulls shuffle labels, compare to simple positional baselines, or test whether a cluster predicts causal importance.

Patterns precede mechanism claims. Attention heatmaps are often where a hypothesis starts. QK/OV analysis and interventions determine whether pattern evidence extends to a mechanism claim.

3. From label to circuit

The indirect-object task shows how much a label depends on the circuit around it. On "When John and Mary went to the store, John gave a drink to," the heads that push Mary toward the output are name movers. Ablate them and the prediction does not collapse as far as the label predicts, because backup name movers that were quiet on the clean run step up to do the same job. The original heads were named for a role they hold only while the backups stay idle.

The same circuit contains heads whose OV path writes against the correct name, a copy-suppression role that lowers an answer rather than promoting it. A heatmap alone would not separate a promoter from a suppressor, since both can attend to the name; the sign of the OV write does. The label that survives is the one tied to what the head writes and to how the rest of the circuit responds when it is removed, which is why head identity is better read as a position in a circuit than as a fixed type. The backup behavior is an instance of the self-repair discussed under causal interventions.

A head-type label becomes a mechanism claim only after the analysis identifies what the head writes, tests whether the label survives a matched null, and shows whether changing the head changes the behavior the label names.

Citations Related pages

What next