Compositionality & Semantic Probes

Phrase meaning depends on relations between words, not only on word-level vectors.

The principle of compositionality says that the meaning of a complex expression depends on the meanings of its parts and the way those parts are combined. In formal semantics this is usually expressed with types, functions, arguments, and lambda-style composition. In neural representations, the same question becomes geometric: can we read the meaning of a phrase or parent constituent from the representations of its children?

Semantic probes are harder to interpret than simple label probes because meaning is relational. A noun phrase, a verb-object pair, and a modifier construction are not just two vectors sitting next to each other. Composition probes therefore often operate on pairs: head and dependent, function and argument, modifier and modified phrase.

Figure 1 · Three hypotheses for semantic composition

1. Probe form changes the hypothesis

An additive probe predicts the phrase vector as $c = A h + B d$ from the head $h$ and dependent $d$, composition as a sum of two linear maps. A bilinear probe adds a multiplicative term $h^\top W d$, which lets the dependent reshape the head's contribution in a relation-specific way. A nonlinear probe gives the readout still more freedom. The choice is not only engineering; it expresses a hypothesis about how much structure is already present in the representation. The additive form echoes vector-averaging models of phrase meaning, while the bilinear form echoes tensor-product and categorical models, where a relation acts multiplicatively on its arguments.

If an additive probe works well, the model's representation may already place parts in a space where composition is close to linear. If only a flexible nonlinear probe works, the probe may be doing more of the semantic work itself. As with syntactic probes, the interpretation depends on capacity, controls, and interventions.

2. Types and argument structure

Formal semantics distinguishes predicates, arguments, quantifiers, modifiers, and type-shifting operations. This vocabulary separates semantic tasks that would otherwise collapse into a vague "meaning" label. A subject relation, an object relation, and an adjunct relation are not automatically the same composition operation.

Head
The word that fixes a phrase's category and what it can combine with, such as the verb in a verb phrase or the noun in a noun phrase. Composition is described relative to the head.
Dependent
A word or phrase attached to the head: a subject, object, or modifier. A composition probe takes a head-dependent pair as input rather than a single token.
Dependency relation
The typed head-dependent link. Universal Dependencies labels such as nsubj, obj, amod, and nmod name the composition site: a head, a dependent, and the relation between them. A bilinear probe can condition on the relation, not just the two words.
Argument
An expression that fills a role a predicate requires, as "the door" fills the object of "open." In Montague-style semantics a predicate is a function and its arguments are the inputs it applies to, written f(x).
Type shift
A change of semantic type so an expression can compose where a different type is expected, for example coercing a proper name (type e) into a quantifier-like meaning (type ⟨⟨e,t⟩,t⟩). Type shifting keeps composition well-typed when surface forms do not line up.

3. Where additive composition breaks down

Intersective modifiers are the case additive composition handles well. A "red car" is a thing that is both red and a car, so a meaning built by adding a "red" component to a "car" vector lands in roughly the right place. Adjectives that work this way are why simple vector-averaging models of phrase meaning get as far as they do.

Privative modifiers break that. A "fake gun" is not a gun, and a "former senator" is not currently a senator, so the modifier moves the result outside the head's own category rather than adding a feature to it. Addition cannot move the result off the line between the two word vectors, which is what these phrases require. Privatives are one non-intersective class; subsective adjectives are another and fail additivity for a softer reason. A "skillful surgeon" is still a surgeon, but "skillful" means something different for surgeons than for violinists, so its contribution depends on the head it attaches to. A bilinear term handles both, because $h^\top W d$ lets the modifier apply a head-dependent transformation instead of a fixed offset. When an additive probe fails on such phrases and a bilinear one succeeds, the gap is evidence that the representation encodes the modifier as an operation, not as a direction to add.

The schematic is not a result. The additive, bilinear, and nonlinear forms illustrate different composition hypotheses. They are not results for a particular model or dataset.
Citations Related pages

What next