Dependency Trees & Structural Probes

Structural probes test whether dependency-tree distance and depth are readable as geometry.

Dependency grammar represents a sentence as a tree. Each word depends on a head. For example, an adjective depends on the noun it modifies, a subject depends on its verb, and a determiner depends on its noun. This gives two geometric targets: distance between words in the tree, and depth of each word from the root.

A structural probe asks whether a linear transformation of contextual word representations can make those syntactic quantities visible. In the original version, squared distance in the transformed space is trained to match tree distance, and squared norm is trained to match tree depth. If the transformed geometry recovers a dependency tree, syntax is not just linearly classifiable one edge at a time; a whole tree-like structure is present in the representation.

Figure 1 · Structural probe as tree geometry

layer 6

MST step 5

1. The syntactic object

A dependency tree is sparse: a sentence with five words has four dependency edges. But the structural probe trains against richer quantities. Every pair of words has a tree distance, and every word has a depth. Those targets turn the tree into a geometry problem.

For a concrete case, take "The cat sat on the mat." The tree distance from cat to sat is 1, and from cat to mat is 3, following the path from cat to sat to on to mat. Depth counts from the root, so sat has depth 0 and cat has depth 1.

The probe learns a linear map $B$ so squared distances $\lVert B(h_i - h_j)\rVert^2$ approximate dependency-tree distances, while $\lVert B h_i\rVert^2$ approximates depth from the root.

Because predicted distances are pairwise, the probe extracts an unrooted minimum spanning tree and scores it with UUAS: the fraction of gold undirected edges recovered. UAS and LAS require direction or labels, which this geometry probe does not produce.

2. Middle-layer peak

Many probing studies find that syntactic information is easiest to decode in middle layers. Early layers are close to surface form, while later layers are closer to task-specific or next-token-output information. Syntax is an intermediate abstraction, so it often peaks in the middle. The layer slider uses a schematic curve; it is not a measurement from a particular model.

In Hewitt and Manning's original setup, a rank-128 distance probe on BERT-base recovers roughly 82% UUAS on the Penn Treebank test set, read from around layer 7.

Geometry is not parsing. A probe that recovers a tree is not the same thing as showing the model runs a parser internally. It shows that parse-like structure can be read from the representation under the probe's assumptions.

3. Controls and limitations

Structural probes inherit the general probe-validity problem. Baselines, control tasks, and data splits affect the interpretation. A model may encode word position, lexical association, or local adjacency in ways that help tree recovery without amounting to syntactic knowledge. Lexical controls and controlled syntactic phenomena reduce semantic shortcuts.

The conditional conclusion is narrower: under this transformation and evaluation, dependency-like geometry is more available than in the chosen baselines.

Citations

Hewitt and Manning (2019), "A Structural Probe for Finding Syntax in Word Representations", for the distance/depth probe.
Hall Maudslay and Cotterell (2021), "Do Syntactic Probes Probe Syntax?", for lexical controls.

Related pages

Probes and Validity for the broader probe-evidence framework.

What next

Compositionality & Semantic Probes

Semantic probes test how meanings combine.

Binding

Nonlocal dependencies across language, code, and logic.