Attention Head Labels

Labels such as positional, induction, and name-mover heads are hypotheses about circuits, not stable kinds.

Attention-head labels name patterns that recur across analyses: positional heads, induction heads, syntactic heads, rare-word heads, copy-suppression heads, retrieval heads, and name-mover heads. The labels are hypotheses about computations, not stable ontological kinds. A head is a parameterized computation in a layer, not a species.

The same surface pattern can support different functions, and the same function can be distributed across several heads. A cluster of head embeddings or attention patterns is evidence about similarity. It is not by itself evidence about causal role.

Figure 1 · Clusters before and after a null

within-layer null strength 0

1. What the labels describe

Labels help orient analysis. In Figure 1, a cluster labeled "positional" means the heads share offset or boundary-token behavior; a cluster labeled "induction" means the heads participate in a prefix-match-and-copy pattern. Those words make the analysis discussable. They also provide intervention hypotheses: if this is a copy head, ablating or patching it changes copying behavior under the relevant metric.

The important split is between pattern labels and mechanism labels: positional, induction, retrieval, syntactic, name-mover, copy-suppression, and rare-word heads can look similar in attention space, but only OV writes and interventions say what role they play.

Figure 2 · Head label, attention pattern, and OV write sign

head label positional

destination token 7

Each panel is a destination-by-source attention grid for one short sentence, darker where a destination token attends more to a source token. The shapes are caricatures of the labels, drawn to be told apart, not read off a real model.

2. What the labels hide

Head identity is layer-sensitive. Two heads with similar attention heatmaps can write very different OV outputs. Two heads with different heatmaps can contribute to the same behavior. A cluster may reflect layer, token-position statistics, or training artifacts more than a human-interpretable function.

A within-layer null asks whether the apparent separation remains after controlling for layer position. Other nulls shuffle labels, compare to simple positional baselines, or test whether a cluster predicts causal importance.

Patterns precede mechanism claims. Attention heatmaps are often where a hypothesis starts. QK/OV analysis and interventions determine whether pattern evidence extends to a mechanism claim.

3. From label to circuit

The indirect-object task shows how much a label depends on the circuit around it. On "When John and Mary went to the store, John gave a drink to," the heads that push Mary toward the output are name movers. Ablate them and the prediction does not collapse as far as the label predicts, because backup name movers that were quiet on the clean run step up to do the same job. The original heads were named for a role they hold only while the backups stay idle.

The same circuit contains heads whose OV path writes against the correct name, a copy-suppression role that lowers an answer rather than promoting it. A heatmap alone would not separate a promoter from a suppressor, since both can attend to the name; the sign of the OV write does. The label that survives is the one tied to what the head writes and to how the rest of the circuit responds when it is removed, which is why head identity is better read as a position in a circuit than as a fixed type. The backup behavior is an instance of the self-repair discussed under causal interventions.

A head-type label becomes a mechanism claim only after the analysis identifies what the head writes, tests whether the label survives a matched null, and shows whether changing the head changes the behavior the label names.

Citations

Clark, Khandelwal, Levy, and Manning (2019), "What Does BERT Look At?", for early attention-pattern analysis.
Htut, Phang, Bordia, and Bowman (2019), "Do Attention Heads in BERT Track Syntactic Dependencies?", for syntactic head tests.
Michel, Levy, and Neubig (2019), "Are Sixteen Heads Really Better than One?", for head pruning and redundancy.
Wang, Variengien, Conmy, Shlegeris, and Steinhardt (2023), "Interpretability in the Wild", for IOI circuit head labels such as name movers.
McDougall, Conmy, Rushing, McGrath, and Nanda (2023), "Copy Suppression", for copy-suppression heads.
Wu, Wang, Xiao, Peng, and Fu (2025), "Retrieval Head Mechanistically Explains Long-Context Factuality", for retrieval heads and masking experiments.

Related pages

QK and OV Circuits for separating attention patterns from residual writes.
Causal Interventions for the ablation and patching tests that turn a label into a claim.
The Lookback Mechanism for how retrieval heads dereference a stored binding.

What next

Method

QK and OV Circuits

Separate attention pattern from residual write.

Example

Induction Heads

A named type with a specified copy circuit.