Interpreting vowel articulation from formant frequencies

Southern British English 2: home counties monophthongs

Figure 1: F1/F2 diagram for home counties SBE, partially modified to RP. From Wood, Working Papers 23, Lund
Southern British English 2: home counties monophthongs

Figure 2: F1/F2 diagram for home counties SBE. From Wood, Working Papers 23, Lund

Figures 1 and 2 show two F1/F2 vowel diagrams. Since the 1950s, it’s been customary to interpret F1 as tongue height, and F2 as tongue position, with reference to the Bell vowel model (Sidney Wood,  1982, X-Ray and Model Studies of Vowel Articulation, Working Papers 23, Department of Linguistics, University of Lund; and The Bell vowel model on this website). Following this tradition, differences between these two diagrams would be explained in terms of postulated shifted tongue positions by advanced or retracted and raised or lowered tongue body.

A harder interpretation (Martin Joos, 1948, Acoustic Phonetics, Language Monograph 23, supplement to Language 24) claimed that phonetic judgements of tongue height and tongue position were in fact exclusively mental analyses of formant patterns, mistakenly expressed as articulation. In either case, recall that the Bell vowel model was already flawed before it was launched in 1867, and that Bell himself had abandoned it almost immediately (Sidney Wood 1982, and The acoustic weaknesses of Bell’s vowel model on this website).

The relationship between vowel articulation and formant frequencies is much more complex. Simple might be beautiful in science, but there are always limits beyond which causal relationships and explanatory power are compromised, and the Bell vowel model is definitely out of bounds.

Bell vowel model: its acoustic weaknesses
Figure 3. Standing waves for each of four formants (F1-F4, each with its own profile). The tubes represent the entire vocal tract (lips to the left). The standing waves are drawn for volume velocity, A and N are the locations of the antinodes (bellies) and nodes, respectively, showing where local narrowing or widening will shift a formant frequency. Adapted from Chiba, T., and Kajiyama, M., 1941, The Vowel, its Nature and Structure, Tokyo.

First, some theory (Fig. 3). Each formant corresponds to a standing wave in the entire vocal tract, and the frequency of any formant can be modified by widening or narrowing the vocal tract at any node or antinode in that formant’s standing wave. Standing waves can be presented for various properties of the vibration, sound pressure or volume velocity are most usual. Volume velocity is a measure of the locally vibrating body of air. Figure 3 shows volume velocity. Standing waves for four formants are shown, the principle is the same for all formants.

  • The rules for volume velocity standing waves are:
    • Narrowing the vocal tract at an antinode lowers that formant frequency
    • Narrowing the vocal tract locally at a node raises that formant frequency
    • Widening the vocal tract locally at a node or antinode does the opposite to narrowing
  • Hint: remember that narrowing the lip opening lowers the frequencies of all formants (all formants have an antinode at the lips).

So where are the nodes and antinodes of the standing wave of each formant? Figure 3 was drawn for the special case of a uniform tube, a configuration that never occurs in natural speech (the lips, teeth, tongue tip, uvula, epiglottis and larynx mean there are too many intruding lumps). But the good news is it doesn’t matter. The nodes and antinodes don’t move about too much from vowel to vowel. Figure 3 shows how a volume velocity standing wave always has an antinode at the lips and a node at the glottis. Then each higher formant has an extra pair. So F1 has 1 of each, F2 has 2 of each, F3 has 3 of each, and so on. They alternate, and are spaced more or less evenly along the vocal tract.

1979figs1and2Figure 4. Area functions for Home Counties SBE (left) and Cairo Arabic (right), 4 instances of each vowel by both, demonstrating just four constriction locations for all vowels (from top to bottom, along the hard palate, along the velum, in the upper pharynx and in the lower pharynx). The area functions are lined up from the central incisors (coordinate 0 cm). The letters identify parts of the vocal tract: LP lips; HP hard palate; SP soft palate; U uvula; PHA pharynx; LX larynx. From Wood, 1979, Journal of Phonetics 7:25-43.

All this means that we need to be looking for local shape changes all along the vocal tract, local narrowing and widening. Fortunately, there are just four that matter for vowels (Fig. 4): along the hard palate for [i ɪ e ɛ] and [y ʉ ø œ], along the velum for [u ʊ] and [ɯ], in the upper pharynx for [o ɔ] and [ɤ ʌ], and in the lower pharynx for [æ a ɑ] and [ɒ], confirming and extending Stevens’ observations on the quantal nature of speech (1972, in David, E. E., and Denes, P. B. (eds)., Human Communication, a Unified View, p.p. 51-66).

At each constriction location, vowel timbres are varied by modifying the degree of constriction, tongue body posture, tongue blade elevation, lip activity and larynx depression.

Southern British English 2: formants and articulation

Figure 5. The frequencies of F1 and F2 generated by the three parameter model for the four preferred constriction locations, based on nomograms by Stevens and House (1955, Development of a quantitative description of vowel production, Journal of the Acoustical Society of America, 27:484-495); distance from source to constriction 12cm for the hard palate, 8.5cm for the soft palate, 6.5cm for the upper pharynx, and 4.5cm for the lower pharynx. The parameters are explained in the text. The superimposed vowel areas are from a sample of Home Counties SBE recorded from the radio in the 1970s. From Wood, 1979, Journal of Phonetics 7:25-43.

One step remains, to translate Fig. 4 into a vowel formant diagram illustrating how each vowel is articulated. This is done in Fig. 5, that shows how varying the degree of constriction (Amincm2) and the lip opening (A/lcm) at each of the four constriction locations reproduces the zones where [i-ɛ]-like timbres, [u-ʊ]-like timbres, [o-ɔ]-like timbres, and [æ-ɑ]-like timbres are expected. A real life example of a vowel system is superimposed for comparison. The two cases don’t match perfectly, mainly due to their different vocal tract sizes, just as the F1/F2 vowel diagrams of different real life speakers fail to match perfectly (compare Figs. 1 and 2 again). The speaker of the English vowel system in Fig. 5 will also have used tongue blade and larynx movement, and split his lip opening into lip and and jaw movement, adding to the complexity of his articulation. But they are similar and close, and generalised conclusions are possible. For example, comparing the real /u/ and /ʊ/ zones with the model velar zone, variation within /u/ is mainly due to lip variation, but the difference between /u/ and /ʊ/ is mainly a more open constriction and less lip rounding for /ʊ/.