What are formants?

Praat for Beginners:
Tutoral: What are formants?

  • A formant is a concentration of acoustic energy around a particular frequency in the speech wave. There are several formants, each at a different frequency, roughly one in each 1000Hz band for average men. The corresponding range for average women is one formant every 1100Hz. The true range depends on the actual length of the vocal tract. Each formant corresponds to a resonance mode of the vocal tract.
  • Formants can be seen very clearly in a wideband spectrogram, where they are displayed as dark bands. The darker a formant is reproduced in the spectrogram, the stronger it is (the more energy there is around its frequency, or the more audible it is):
  • The green arrows at F on this spectrogram point out six instances of the lowest formant. The next formant occurs just above these, between 1 and 2 Khz. Then the next is just above that, between 2 and 3kHz. And so on.
  • When you look at a spectrogram, like this example, you will see formants everywhere, in both vowels and consonants. To understand why, you must recall the source-filter theory of speech production. The vocal tract filters a source sound (e.g. periodic voice vibrations or aperiodic hissing) and the result of the filtering is the sound you can hear and record outside the lips and show on a spectrogram. Formants occur, and are seen on spectrograms, around frequencies that correspond to the resonances of the vocal tract, i.e. at frequencies where the impedence is low (impedence is resistance to vibration at a given frequency).
  • But there is a difference between oral vowels on the one hand, and consonants and nasal vowels on the other. For consonants, there are also antiresonances in the vocal tract at one or more frequencies due to oral constrictions. An antiresonance is the opposite of a resonance, such that the impedence is high rather than low at those frequencies. Consequently, they attenuate or eliminate formants at or near these frequencies, so that they appear weakened, or are missing altogether, when you look at spectrograms. That is why, for example, it is difficult to see formants below 3000 or 4000Hz for the two instances of [s] in the spectrogram above.
  • In addition, for nasal consonants and nasal vowels, the vocal tract divides into a nasal branch and an oral branch, and interference between these branches produces more antiresonances. Furthermore, nasal consonants and nasal vowels can exhibit additional formants, nasal formants, arising from resonance within the nasal branch. Consequently, nasal vowels may show one or more additional formants due to nasal resonance, while one or more oral formants may be weakened or missing due to nasal antiresonance.
   By convention, oral formants are numbered consecutively upwards from the lowest frequency. The example to the left is a fragment from the previous wideband spectrogram and shows the sequence [ins] from the beginning. Five formants are visible in this [i], labelled F1-F5. Four are visible in this [n] (F1-F4) and there is a hint of the fifth. There are four more formants between 5000Hz and 8000Hz in [i] and [n] but they are too weak to show up on the spectrogram, and mostly they are also too weak to be heard. The situation is reversed in this [s], where F4-F9 show very strongly, but there is little to be seen below F4.
  • Formants can be seen very clearly in another type of diagram, spectral slice, that shows the sound spectrum at a moment in time. The following examples show spectra taken during the same [i] and [s] from the beginning of the spectrogram:
  • Seen this way, the sound spectra look like mountain landscapes and the formants appear as peaks, a metaphor that is often used for formants.
  • The energy in a formant comes from the sound source. In the case of the voiced vowel [i], it is the periodic vibration of the vocal folds, producing a series of harmonic tones. The next picture is a narrowband spectral slice from the same vowel [i], showing these harmonic tones. Harmonics, whose frequencies are close to a resonance frequency of the vocal tract, pass freely through the vocal tract, producing a formant. Harmonics, whose frequencies are not close to resonance frequencies, do not pass freely through the vocal tract; they become weakened and form troughs between the formant peaks:
  • In this example, the harmonics occur at an interval of about 110Hz, so there are about 3 harmonics in the actual peak of each formant, and perhaps 1 or 2 more on the flanks.
  • In the case of the voiceless fricative [s], there is aperiodic hissing due to a forced jet of air hitting the front teeth, causing turbulence at all frequencies. The turbulence spreads through the vocal tract, and, again, passes freely at frequencies close to resonance frequencies but does not pass freely at frequencies between resonance frequencies, producing stronger formant peaks with weaker troughs between them. The next picture is a narrowband spectral slice from the same [s] in [ins]:
  • All vowels can be can be characterized by F1 and F2. For example, look at the example spectrogram again, and compare the first and last vowels:
  • In the first vowel, F2 is high (close to F3), but in the last vowel it is low (close to F1). Vowels traditionally known as front have F1 and F2 a good distance apart, like the first vowel here. Vowels traditionally know as back have F1 and F2 so close that they touch, like the last vowel here.
  • But a more complete description of front vowels requires at least F3 as well, which differenciates between [i] and [y] etc.
  • If you are studying the spectral qualities of singing, you will also want to look at F3 and F4, which can be made much stronger in singing than in speaking. Trained singers manipulate F3 and F4 by lowering the larynx and elevating the tongue blade to enhance this part of the spectrum and make it heard above an orchestral accompaniment. The following narrowband spectrogram, of the phrase Gloria in excelsis Deo sung by a barytone voice, illustrates this enhanced F3+F4 region around 2-4kHz:
Top
© Sidney Wood and SWPhonetics, 1994-2012