Compositionality & Semantic Probes

Phrase meaning depends on relations between words, not only on word-level vectors.

The principle of compositionality says that the meaning of a complex expression depends on the meanings of its parts and the way those parts are combined. In formal semantics this is usually expressed with types, functions, arguments, and lambda-style composition. In neural representations, the same question becomes geometric: can we read the meaning of a phrase or parent constituent from the representations of its children?

Semantic probes are harder to interpret than simple label probes because meaning is relational. A noun phrase, a verb-object pair, and a modifier construction are not just two vectors sitting next to each other. Composition probes therefore often operate on pairs: head and dependent, function and argument, modifier and modified phrase.

Figure 1 · Three hypotheses for semantic composition

composition form additive

phrase type red car

interaction strength 0.65

1. Probe form changes the hypothesis

An additive probe predicts the phrase vector as $c = A h + B d$ from the head $h$ and dependent $d$, composition as a sum of two linear maps. A bilinear probe adds a multiplicative term $h^\top W d$, which lets the dependent reshape the head's contribution in a relation-specific way. A nonlinear probe gives the readout still more freedom. The choice is not only engineering; it expresses a hypothesis about how much structure is already present in the representation. The additive form echoes vector-averaging models of phrase meaning, while the bilinear form echoes tensor-product and categorical models, where a relation acts multiplicatively on its arguments.

If an additive probe works well, the model's representation may already place parts in a space where composition is close to linear. If only a flexible nonlinear probe works, the probe may be doing more of the semantic work itself. As with syntactic probes, the interpretation depends on capacity, controls, and interventions.

2. Types and argument structure

The page only needs a small slice of formal-semantics vocabulary: a probe usually takes a head, a dependent, and their dependency relation as input. Subject, object, and modifier relations are not automatically the same composition operation, so a bilinear probe can condition on the relation rather than treating every pair of words alike.

3. Where additive composition breaks down

Intersective modifiers are the case additive composition handles well. A "red car" is a thing that is both red and a car, so a meaning built by adding a "red" component to a "car" vector lands in roughly the right place. Adjectives that work this way are why simple vector-averaging models of phrase meaning get as far as they do.

Privative modifiers break that. A "fake gun" is not a gun, and a "former senator" is not currently a senator, so the modifier moves the result outside the head's own category rather than adding a feature to it. Addition cannot move the result off the line between the two word vectors, which is what these phrases require. Privatives are one non-intersective class; subsective adjectives are another and fail additivity for a softer reason. A "skillful surgeon" is still a surgeon, but "skillful" means something different for surgeons than for violinists, so its contribution depends on the head it attaches to. A bilinear term handles both, because $h^\top W d$ lets the modifier apply a head-dependent transformation instead of a fixed offset. When an additive probe fails on such phrases and a bilinear one succeeds, the gap is evidence that the representation encodes the modifier as an operation, not as a direction to add.

The schematic is not a result. The additive, bilinear, and nonlinear forms illustrate different composition hypotheses. They are not results for a particular model or dataset.

Citations

Montague (1973), "The Proper Treatment of Quantification in Ordinary English", for the type-theoretic compositionality tradition.
Partee (1986), "Noun Phrase Interpretation and Type-Shifting Principles", for type-shifting operations.
Mitchell and Lapata (2010), "Composition in Distributional Models of Semantics", for additive and multiplicative vector composition.
Smolensky (1990), "Tensor Product Variable Binding and the Representation of Symbolic Structures in Connectionist Systems", for multiplicative role-filler binding.
Coecke, Sadrzadeh, and Clark (2010), "Mathematical Foundations for a Compositional Distributional Model of Meaning", for tensorial distributional semantics.

Related pages

Dependency Trees & Structural Probes for the syntax side of the same problem.
Probes and Validity for capacity controls when the semantic readout is flexible.

What next

Before

Structural Probes

Tree geometry before semantic composition.

Binding

A different relational problem: matching uses to sources.