The acoustic weaknesses of the Bell vowel model

The references can be opened in a separate page here.

  1. The single resonance and two resonance theories
  2. Transmission line theory
  3. How vowel articulation is related to vowel spectra

1. The single resonance and two resonance theories.

Bell coupled his new vowel model to the then popular single resonance theory, claiming the vowel tone (resonance) depended on the dimensions of the buccal cavity. In Visible Speech (1867:71), he postulated a “point of greatest contraction, or the configurative aperture”, that “may be shifted to any part of the back or front of the palatal arch”, determined by the tongue location along his revolutionary innovation of a frontback continuum (Fig. 1).

Figure 1.
Bell’s acoustic theory: a configurative aperture (0), located anywhere along the hard or soft palates by tongue fronting or backing, is the posterior limit of the buccal cavity, whose resonance determines vowel timbre. The dimension of the aperture (0) is set by tongue height.

Bell’s tongue height dimension replaced the mouth opening (or jaw opening) of the ancient throat-tongue-lip model. Tongue height explicitly adjusted the degree of constriction in the configurative aperture, but, unfortunately, Bell did not explain its spectral effect. Roudet (1911) pointed out that both widening and lengthening of the buccal cavity would have the same effect on its volume and consequently on its resonance frequency, implying that both tongue backing and tongue lowering would be mutually compensatory in Bell’s model, rather than behaving as the independent parameters that Bell had intended.

However, the single resonance theory had already been flawed before Bell’s book was even published. In 1863, Helmoltz had reported two resonances for nonrounded vowels (p. 107 in Ellis’s English translation, which was not available before 1885). Bell’s son Graham (1879), following and extending Helmholtz, recognized two resonances in every vowel, assigning them respectively to the anterior cavity and posterior cavity (i.e. the buccal cavity and the pharynx). Graham Bell corresponded with Ellis, who included a footnote in the Helmholtz translation. Lloyd (1890-92), familiar with both Helmholtz’s and Graham Bell’s work, assigned the two resonances to the “porch” and the “chamber”, which amounted to the same thing. Moreover, he discovered a third resonance that he assigned to a cavity between the lips. All these discoveries should have stopped A. M. Bell’s model. But hardly anyone listened. A. M. Bell apparently did, however, and never mentioned it again.

And there the story would have ended but for Henry Sweet. For A. M. Bell’s universal transcription system, that he launched in Visible Speech, was never a success and was soon forgotten. And his new vowel model would have been forgotten along with it, if Sweet hadn’t promoted it, assisted by Sievers in Germany. By the new century, practically the whole world had adopted it. It offered so many more vowel timbres (273 in Bell’s presentation), and promised infinitely more (the slightest tongue shift in any direction was claimed to produce a new timbre), so why be troubled by weak acoustics? Or by weak physiology either (but that’s a separate story).

There was no way of testing Bell’s configurative aperture until probing methods were introduced (Grandgent 1890, Atkinson 1898) and then radiography (Scheier 1909). It turned out not to be easy to find, and by the time of Daniel Jones reference was being made to the highest point of the tongue instead.

Among phoneticians in general, the number of resonances remained a controversial issue, many still preferring the single resonance theory. One possible reason is that their basic textbooks, especially Sweet’s and later Jones’s, continued to do so, providing them with what appeared to be a rational tool. This was also a period of severe dispute between the “acoustic” and “organic” schools as to the primacy of sound or articulation for speech communication, adding to any uncertainty. The acoustic school (e.g. Pipping 1893) argued that the irrelevance of articulation for speech sound was proved by playing back speech recordings, presumably wax cylinders played through the horn of a phonograph (naively and deceptively overlooking the articulation of the subject who was originally recorded). Another acoustic school argument (e.g. Roudet 1911) claimed that mutually compensatory manoeuvres (citing lip rounding and tongue retraction) would render articulation unpredictable. Wood (1986) demonstrated that this particular alleged compensation does not work owing to the quantal principle (Stevens 1972). Lloyd (1891-95) pleaded that both the organic and acoustic lines of investigation are complementary and that phonetic science would benefit from any investigation that contributes to unification. In vain. Pipping (1893) wrote a scathing review of Lloyd’s publications.

Paget (1922, 1930 chapts 3-5), a student of Daniel Jones around 1920, also identified two resonances for both unrounded and rounded vowels, initially unaware of earlier work. By then, the two resonances were generally known as the mouth and throat formants, with Bell’s elusive configurative aperture seen as a dividing isthmus separating the two cavities (remember, no-one had actually verified Bell’s configurative aperture yet, whether by probing or by X-ray, and they still have not).

Lloyd’s and Paget’s studies were reworked and confirmed by Crandall (1927).

After 1945, measurements made on spectrograms from the then recently invented Kay Sonagraph revealed that the frequencies of F1 and F2 were associated with judgments of tongue height and backing, seemingly confirming that tongue height was determining a throat resonance and F1, and tongue backing a mouth resonance and F2. Alas, they claimed correlation and tumbled into the classic pitfall, confusing correlation and causality. No-one has ever, yet, demonstrated a consistent and unique causal relationship between true tongue height and F1 frequency, or true tongue backing and F2 frequency. It just does not work.

The overshadowing difficulty was the higher resonances. A third resonance had been glimpsed ever since the 19th. century, and spectrograms now revealed several more. The problem was not acute for speech perception or specification, just two, occasionally three, formants being deemed adequate for reporting and discrimination. But the higher formants were a reminder that the theory was far from adequate and might be hiding surprises. There just are not enough distinct cavities in the vocal tract for each higher resonance to have its own. The solution would yet again demonstrate that Bell’s tongue backing and tongue height are irrelevant parameters for shaping the vocal tract and tuning it for vowel spectra.

2. Transmission line theory

Suppose, for a moment, a home might be found for F3 in its own unique cavity between the lips, as Lloyd had proposed more than 100 years ago. Yet, additionally, anyone familiar with reading spectrograms will also know that F3 rises to its maximum when the tongue blade is elevated for dental consonants, narrowing the palato-alveolar region (part of the buccal cavity where F2 was said to reside). Evidently, F3 cannot be assigned to some unique cavity, but something else is happening here, something unforeseeable by Bell (Rayleigh’s two volumes on the theory of sound were not published until some ten years after Bell’s Visible Speech).

The Bell model had been extended from the single resonance theory to the two-resonance theory by stretching the imagination a little (by supposing Bell’s configurative aperture really existed, and by pretending this imagined aperture constituted an isthmus between a mouth cavity and a throat cavity). But a third resonance was too much for it. The third resonance was homeless. And we still have not looked at F4, F5 and beyond.

Joos (1948:57-59) pointed out that the mathematical theory does not allow for resonances located in minor side chambers or irregularities. Instead, each resonance is a mode of oscillation of the entire vocal tract, its frequency being sensitive to local expansion or narrowing at its respective nodes or antinodes (bellies). Consequently, a given articulator manoeuvre will modify several resonances, and a given resonance can be modified in several different parts of the vocal tract. Chiba and Kajiyama (1941, available more widely as 1958) were the first to pursue this route, followed by Fant (1960) and Stevens and House (1961). Fant referred to this approach as transmission line theory (modelling the vocal tract with electronic circuits for measurements and calculations).

In (1965), Fant demonstrated once again (i) the irrelevance of tongue positions defined by height and backing ([a]-like vowels  need a low pharyngeal constriction), (ii) the affiliation of F2 to both the mouth and the throat regions (refuting theories assigning resonances uniquely to own cavities), and (iii) the futility of searching for individual cavities for the higher resonances (discredited by the standing wave phenomena revealed by transmission line theory).

So, yet again, the acoustic theory behind the Bell vowel model was flawed. At this point it would be easy to join the old acoustic school and claim that articulation is irrelevant. However, there are some simple rules that link vowel articulation with the spectral properties of vowels. These are introduced in the next section, and introduced in detail separately.

3. How vowel articulation is related to vowel spectra.

There are some very simple rules that tell us whether expanding or narrowing various parts of the vocal tract will raise or lower a particular resonance frequency. First, we need to examine standing waves in order to see where the nodes and antinodes occur in the vocal tract for each resonance, since these are the places where a resonance frequency can be modified by movement of an appropriate speech articulator (lips, mandible, tongue tip and blade, tongue body, and by constricting sphincter muscles at the velum and in the pharyngeal walls).

Figure 2 illustrates the locations of the nodes (N) and antinodes (A) of each standing wave for four resonances (corresponding to F1, F2, F3, and F4), adapted from Chiba and Kajiyama.

Figure 2. Standing wave examples computed by Chiba and Kajiyama for volume velocity (a local measure of the vibrating air) along a uniform tube that is open at one end. The fish-shaped curves within the tube record the volume velocity amplitudes at different places along the tube. Locations with minimum volume velocity are the nodes (N), locations with maximum volume velocity are the antinodes (A). The nodes and antinodes are also shown in their corresponding positions in a vocal tract profile. Note that similar standing waves can be computed for sound pressure instead, the difference being that sound pressure has nodes where volume velocity has antinodes, and antinodes where volume velocity has nodes.

The good news is that each node and antinode stays more or less in the same part of the vocal tract for any vowel, so the rules apply generally to the vocal tract shapes of all vowels. Start off by remembering that the frequencies of all formants are lowered by narrowing the lip opening, that is the key to remembering how the rules work. Look at all four diagrams in Fig. 2 and see that there is always a volume velocity antinode (A) at the mouth opening. This will remind us of the first rule: narrowing the vocal tract at a volume velocity antinode lowers the frequency of that resonance. Narrowing the vocal tract at a volume velocity node raises the frequency of that resonance. Widening the vocal tract at volume velocity antinode or node does the opposite.

©Sidney Wood and SWPhonetics, 1994-2014