Praat for Beginners:
Tutorial: Spectral analysis
What ar formants?
Quick guide to spectral analysis
1. Spectral analysis of speech
- The purpose of spectral analysis is to find out how acoustic energy is distributed across frequency. Typical uses in phonetics are discovering the spectral properties of the vowels and consonants of a language, comparing the productions of different speakers, or finding characteristics that point forward to speech perception or back to articulation.
- Formerly, calculation was time-consuming so it was more practical to work on the lab bench using bandpass filters and then measure the filter output at a range of frequencies. From the 1950s onward, this was done by the spectrograph, that burnt a spectrogram onto paper as a permanent record. Nowadays, a suitable computer program will calculate speech spectra in seconds.
- There are two methods for spectral analysis: the fast Fourier transform (FFT) and linear prediction (LPC). FFT finds the energy distribution in the actual speech sound, whereas LPC estimates the vocal tract filter that shaped that speech.
- A strict distinction between resonance as a filter property and the sound energy peaks shaped by it has hardly ever been maintained, and the term formant has usually been applied loosely to both concepts ever since it was coined in the early 20th century to describe vocal tract reonance and the timbre of music intruments.
- The advantage of FFT is easier setup, the disadavantage is difficulty identifying formants by speakers with higher pitched voices. LPC has better success with high-pitched voices, but the settings need to be carefully tuned for each speaker.
2. FFT parameters
- The main choice in FFT analysis is between a coarser setting, that shows the formants, and a finer setting, that shows the voice harmonics. These correspond to the wideband and narrowband settings, respectively, of the spectrograph. A filter passband around 300Hz allows two or three strong voice harmonics at a time to pass through and be registered together as energy peaks. A filter passband around 50Hz admits just one voice harmonic at a time and each harmonic is recorded individually. Unfortunately, when the voice fundamental frequency is high (some high-pitched male voices and all female voices), the wider filter has a tendency to act like the narrower filter, resolving the voice harmonics rather than formants. The parameter to set in Praat is the analysis window duration, suitably around 0.003s to 0.005s for the courser wideband analysis, and around 0.03s for the finer narrowband analysis. This is outlined in detail in Making spectrograms and Making FFT slices.
- Other FFT parameters affect the appearance of the finished spectrogram or slice on the screen: View range to set the frequencies you wish to see, Dynamic range to hide intrusive background noise.
3. LPC parameters
- The formant picking procedure used with LPC in Praat works best when a few more formants are sought than are actually needed, e.g. if you want F1 to F3 then analyse 5 or 6 formants.
- LPC has to be tuned to the speaker’s vocal tract length, by setting a suitable number of formants to be found in a given frequency range. An average setting is 1 formant every 1000Hz for men and 1 every 1100Hz for women, which can then be optimized for a particular speaker (see Formant tracking and Making LPC slices).
- Finally, the Prediction order needs to be set (the number of filter coefficients for a desired number of formants), at least 2 for each formant and preferably at least 2 more, e.g. for 5 formants the Predicition order should be at least 12 (2×5, +2).