Perturbation theory

September 2017 saw the 150th anniversary of Alexander Melville Bell’s vowel model. However innovative it may have seemed, his notion of continuous backness and the class of central vowels were purely hypothetical since there was no objective method for determining tongue positions. He accepted the single-resonance theory, that offered an explanation for what we know now as F2.  Once two formants had been discovered, the double-resonance theory provided an extended explanation. Both theories assumed a unique cavity for each formant. However, it was soon clear there were numerous formants but no more unique cavities, solved by perturbation theory that sees each formant in the entire vocal tract with frequencies depending on local shape modifications.

Perturbation theory describes how individual articulator movements tune vocal tract resonance frequencies, applied to speech by Chiba & Kajiyama (1941; The Vowel, its Nature and Structure; reprinted in 1958 by the Phonetic Society of Japan) and Fant (1960; The Acoustic Theory of Speech Production; the Hague, Mouton). For each resonance there is a standing wave between the glottis and the lips, illustrated in the figure below for the first four vowel formants. Each standing wave is characterized by nodes N (where the amplitude of the vibration is zero) and antinodes A (where the vibration amplitude is maximum). Standing waves can be presented for various properties of the vibration, volume velocity (a measure of the locally vibrating body of air) or sound pressure are most usual. This figure happens to show volume velocity (for sound pressure, everything is inverted and nodes and antinodes change places, so stick with volume velocity for now).

Standing waves

The nodes and antinodes of the standing waves (volume velocity) for each of the first four formants (adapted from Chiba & Kajiyama 1941)

The figure was drawn for the special case of the uniform tube, a configuration that never really occurs in natural speech (the lips, teeth, tongue tip, uvula, epiglottis and larynx mean there are too many permanent intrusions). Fortunately, the nodes and antinodes remain in roughly the same locations from vowel to vowel. The first formant has a volume velocity antinode at the lips and a node at the glottis. Then each higher formant has an extra pair, so that F1 has one of each, F2 has two of each and so on. They alternate, and are spaced more or less evenly along the vocal tract for each formant. The nodes and antinodes are closer and closer together at higher formants, and become increasingly difficult to aim at selectively. For F3, for example, the lips and tongue tip are still precise enough to have an effect anteriorly. But the tongue body is too blunt to select the nodes and antinodes of the higher formants, and is most effective for F1 and F2.

The articulators can modify a formant frequency by narrowing or widening the vocal tract at or close to the nodes or antinodes, but not in between. Narrowing the vocal tract at a volume velocity antinode lowers that formant frequency. Narrowing at a volume velocity node raises that formant frequency. Vice-versa for widening. The lips and mandible adjust the mouth opening, the tongue blade anteriorly from the teeth to the hard palate, the tongue body from the hard palate to the lower pharynx, and the tongue root and epiglottis the lower pharynx. In addition, sphincter muscles at the velum and in the upper and lower pharynx can narrow the vocal tract by squeezing it. The mandible is also involved in speaking style, more open for clearer speech and less for reduced speech (for example, Figure 9). Finally, larynx depression lengthens the vocal tract posteriorly, balancing anterior lengthening from lip rounding. All these manoeuvres are active in actual speech and each may affect several formants, while a given formant may be affected by several articulator movements. It is necessary to look at what is happening everywhere in the vocal tract, not just in the mouth cavity.

There is a persistent tradition, originating from the single-resonance and double-resonance theories, that countless compensatory movements would make articulation unpredictable. For example, Léonce Roudet (1911; La classification des voyelles de M. Sweet; Revue de Phonétique 1:347-356) suggested that tongue retraction would compensate for faulty lip rounding, believing that both activities would lengthen the anterior cavity and lower its resonance frequency. Wood (1986; The acoustical significance of tongue, lip and larynx movements in rounded palatal vowels; Journal of the Acoustical Society of America 80:391-401) demonstrated that this proposal does not work. However, other compensatory behaviour has been reported and verified such as correcting ongoing labial or lingual activity for mandibular variation, for example the degree of palatal constriction for front vowels (Lindblom, Lubker & Gay; 1979; Formant frequencies of some fixed-mandible vowels and a model of of speech motor programming by predictive simulation; Journal of Phonetics 7:147-161) and the degree of rounding for rounded vowels. A sceptical attitude should be maintained towards proposed compensations that are not supported by evidence, especially proposals that fail to satisfy the constraints of perturbation theory.

Top