Compositionality & Semantic Probes
The principle of compositionality says that the meaning of a complex expression depends on the meanings of its parts and the way those parts are combined. In formal semantics this is usually expressed with types, functions, arguments, and lambda-style composition. In neural representations, the same question becomes geometric: can we read the meaning of a phrase or parent constituent from the representations of its children?
Semantic probes are harder to interpret than simple label probes because meaning is relational. A noun phrase, a verb-object pair, and a modifier construction are not just two vectors sitting next to each other. Composition probes therefore often operate on pairs: head and dependent, function and argument, modifier and modified phrase.
1. Probe form changes the hypothesis
An additive probe predicts the phrase vector as $c = A h + B d$ from the head $h$ and dependent $d$, composition as a sum of two linear maps. A bilinear probe adds a multiplicative term $h^\top W d$, which lets the dependent reshape the head's contribution in a relation-specific way. A nonlinear probe gives the readout still more freedom. The choice is not only engineering; it expresses a hypothesis about how much structure is already present in the representation. The additive form echoes vector-averaging models of phrase meaning, while the bilinear form echoes tensor-product and categorical models, where a relation acts multiplicatively on its arguments.
If an additive probe works well, the model's representation may already place parts in a space where composition is close to linear. If only a flexible nonlinear probe works, the probe may be doing more of the semantic work itself. As with syntactic probes, the interpretation depends on capacity, controls, and interventions.
2. Types and argument structure
Formal semantics distinguishes predicates, arguments, quantifiers, modifiers, and type-shifting operations. This vocabulary separates semantic tasks that would otherwise collapse into a vague "meaning" label. A subject relation, an object relation, and an adjunct relation are not automatically the same composition operation.
- Head
- The word that fixes a phrase's category and what it can combine with, such as the verb in a verb phrase or the noun in a noun phrase. Composition is described relative to the head.
- Dependent
- A word or phrase attached to the head: a subject, object, or modifier. A composition probe takes a head-dependent pair as input rather than a single token.
- Dependency relation
- The typed head-dependent link. Universal Dependencies labels such as nsubj, obj, amod, and nmod name the composition site: a head, a dependent, and the relation between them. A bilinear probe can condition on the relation, not just the two words.
- Argument
- An expression that fills a role a predicate requires, as "the door" fills the object of "open." In Montague-style semantics a predicate is a function and its arguments are the inputs it applies to, written f(x).
- Type shift
- A change of semantic type so an expression can compose where a different type is expected, for example coercing a proper name (type e) into a quantifier-like meaning (type ⟨⟨e,t⟩,t⟩). Type shifting keeps composition well-typed when surface forms do not line up.
3. Where additive composition breaks down
Intersective modifiers are the case additive composition handles well. A "red car" is a thing that is both red and a car, so a meaning built by adding a "red" component to a "car" vector lands in roughly the right place. Adjectives that work this way are why simple vector-averaging models of phrase meaning get as far as they do.
Privative modifiers break that. A "fake gun" is not a gun, and a "former senator" is not currently a senator, so the modifier moves the result outside the head's own category rather than adding a feature to it. Addition cannot move the result off the line between the two word vectors, which is what these phrases require. Privatives are one non-intersective class; subsective adjectives are another and fail additivity for a softer reason. A "skillful surgeon" is still a surgeon, but "skillful" means something different for surgeons than for violinists, so its contribution depends on the head it attaches to. A bilinear term handles both, because $h^\top W d$ lets the modifier apply a head-dependent transformation instead of a fixed offset. When an additive probe fails on such phrases and a bilinear one succeeds, the gap is evidence that the representation encodes the modifier as an operation, not as a direction to add.
- Montague (1973), "The Proper Treatment of Quantification in Ordinary English", for the type-theoretic compositionality tradition.
- Partee (1986), "Noun Phrase Interpretation and Type-Shifting Principles", for type-shifting operations.
- Mitchell and Lapata (2010), "Composition in Distributional Models of Semantics", for additive and multiplicative vector composition.
- Smolensky (1990), "Tensor Product Variable Binding and the Representation of Symbolic Structures in Connectionist Systems", for multiplicative role-filler binding.
- Coecke, Sadrzadeh, and Clark (2010), "Mathematical Foundations for a Compositional Distributional Model of Meaning", for tensorial distributional semantics.
- Dependency Trees & Structural Probes for the syntax side of the same problem.
- Probes and Validity for capacity controls when the semantic readout is flexible.