Binding

A use resolves against the right source, even when a nearer distractor is available.

Binding is the problem of keeping track of what refers to what. A pronoun resolves against an antecedent, a variable use resolves against its definition, a conclusion depends on the premises that license it. The surface domains differ, but the shape is the same: connect a use to a nonlocal source while a closer, plausible-looking candidate competes for the link.

The difficulty is structural rather than lexical. A model can get local word statistics right and still bind to the wrong source, because the correct source is often farther away than a distractor. That is why binding is studied with minimal pairs: change one token, hold most of the surface statistics fixed, and see whether the resolution follows the structure or the proximity.

Figure 1 · The same binding shape in three domains

1. Source, use, distractor

Each binding case has three parts: a source that licenses the answer, a use that needs resolving, and a distractor that is closer or more salient but wrong. Figure 1 lays the three domains side by side. Toggling the distractor swaps the licensed link for the lure, which is the failure a good test is built to provoke.

Subject-verb agreement makes the structural cue concrete. In "The key to the cabinets is rusty," the verb agrees with key, not the nearer plural cabinets. A minimal pair changes only the head noun's number, "the key" versus "the keys," so the verb form must track the grammatical subject rather than the closest noun. If the model prefers are when cabinets is adjacent, it followed proximity; if it tracks key, it followed structure.

2. Three domains

The same source-use-distractor template recurs with different vocabularies. The examples below are schematic, but they make the shared shape concrete.

Natural language
Agreement, anaphora, and filler-gap dependencies all create source-use links that cross intervening words. The licensed source is set by grammar, and a nearer noun or name is the standard distractor.
Code
In let total = 0; ... let count = item.count; ... return total, the final use resolves to the accumulator while count sits nearby as a lure. The VarMisuse task tests whether a model tracks definitions and uses through program structure rather than text order.
Logic
From A -> B, B -> C, and A, the conclusion C depends on the chain through B. A premise such as D -> C is irrelevant unless D is established, so it functions as a distractor.
Binding vs. variable binding. These cases are not all variable binding in the strict sense. Code use-to-definition, logical chaining, wh-traces, and bound-variable anaphora ("every farmer who owns a donkey beats it") genuinely bind a variable to a value or operator. Referential anaphora ("John ... he") is coreference, and subject-verb agreement is a grammatical dependency; neither binds a variable. What unifies the family is the looser property the page is built around — resolving a use against a nonlocal source — so the broad term is binding, with variable binding reserved for the genuine-variable subset and for the variable-like mechanism described below.
A shared test, not a shared result. The figure is a schema for evaluations, not evidence that one circuit handles all three domains. A positive result shows that the model follows the licensed source and resists the nearer unlicensed one. The recurring design is to specify the source-use relation, introduce a plausible distractor, and compare minimal pairs that differ in which source is structurally licensed.

3. Candidate mechanisms

Induction-like copying
A prefix match can route from a use to an earlier matching context and copy the following token or feature, the mechanism on the induction heads page.
Binding-ID features
The model can mark a source and its later uses with a shared, abstract identifier that survives intervening tokens, so the link is carried by a feature rather than by position.
Use-to-source attention
A head can route directly from the use site to the licensed source, which is what a distractor closer in surface order is designed to disrupt.
Scope-sensitive representations
For code, a model can encode scope and syntax-tree position, so a use resolves through program structure rather than nearby text.

The binding-ID account has the most direct causal evidence. Feng and Steinhardt describe abstract binding vectors that tie an entity to its attributes; an intervention that swaps two entities' binding vectors swaps which attribute the model reports, which is the signature of a genuine binding variable rather than a positional cue. Dai and coauthors locate the same structure in a low-rank subspace, so the binding direction occupies a small part of the residual stream.

4. Representation and routing together

Binding is not a single circuit because it combines two jobs. The model has to mark the source, a representation question about which feature or direction carries the identifier, and it has to deliver that mark to the use site, a routing question about which head reads it. A model can do one well and the other poorly, which is how a nearer distractor wins: the source is represented, but routing attends to proximity instead.

The lookback mechanism is the account that joins the two jobs into one algorithm: the binding ID is written as an address, carried forward as a pointer, and dereferenced by a retrieval head that looks back to the source. The mechanism also varies by model and domain, so a result on agreement does not transfer automatically to code or logic. The shared template is what makes the comparison worth running. Where binding-ID interventions succeed, they show a model using an abstract variable, which is a stronger claim than any probe-level decodability result for the same dependency.

Citations Related pages

What next