Praat for Beginners:
Formant tracking in the Sound editor
For formant tracking, Praat picks the formants from the LPC peaks at regular time intervals along the signal. A typical use is to read off the formant track values as an alternative to measuring formant frequencies on the spectrogram itself.
- Getting Started
- Parameter settings: tuning the analysis:
- Parameter settings: Appearance
- Reading off formant frequencies
- Formant tracks for female speakers
- If you’re still not sure what spectrograms are or what they show, you should read these sections first:
- Formant tracking finds and plots each formant at regular intervals along a spectrogram, producing a continuous red trail through each formant. The numerical values of each formant track can be read off at selected time points or intervals.
- The spectrograms are small here to make them fit this web page. When you are working with formant tracking you will probably want to use the whole screen in order to see all the detail.
- The method used for calculating the formant frequencies is linear prediction (LPC), which estimates the vocal tract filter and identifies the vocal tract resonance frequencies. The advantage is that the result is not sensitive to the voice harmonics and gives better results for the higher pitched voices of women and children. The cost is that work is involved tuning the analysis to individual speaker characteristics.
- The procedure begins with creating a spectrogram, then continues by finding the formants according to the parameter settings, then plotting them on the spectrogram. Finally, some parameters are finely tuned in order to optimize the formant track.
- Once you have optimized the settings for a speaker, the same settings should always be used again for more speech from the same speaker. Keep notes.
- Tuning parameter settings is described for an adult male speaker first, and then optimized for a female speaker.
1A. Speech examples to illustrate formant tracking
- A male and a female speaker are used to illustrate formant tracking in the Sound editor, the same samples that were used for illustrating spectrograms. Their utterances offer a good selection of vowels, sonorants, stops and fricatives that demonstrate the capabilities of formant tracking in Praat. The transcriptions you see were added later using an image editor, they are not available in the Sound editor.
- The first is an adult male speaker of Swedish saying finns det dokumentära inslag (“there are documentary items”):
- The second is an adult female speaker of Swedish saying ett forskningsprojekt (“a research project”):
1B. When is it meaningful to use formant tracking?
- To be completely successful, a formant tracking program must identify all formants correctly everywhere so that you can read off the results with confidence.
- Formant tracking only has a chance of success in oral vowels, because their formants are generally recognizable and can be identified correctly and numbered sequentially from F1 up. Formant tracking might be less successful in nasal vowels owing to the effects of antiresonance and nasal resonance that will interfere with the correct recognition and labelling of formants. Formant tracks seen in consonants are useless and must be ignored because they are usually labelled incorrectly. Antiresonance practically always prevents correct formant identification in consonants.
- Formant tracking in Praatidentifies formants by referring analysed peaks to known average frequencies of F1, F2 etc. Always inspect the formant track plots and compare them with the formants seen on the spectrogram, as a check. Note the following typical difficulties:
- Back vowels are always difficult because F1 and F2 partly overlap. In the worst case F2 may not display a distinct peak and will only appear as a hump on the flank of the F1 peak. Inspect the F2 track plot and check that it is always present.
- The identification procedure might not always work well with nasal vowels owing to the presence of nasal formants in the spectrum and the possible attenuation of oral formants by antiresonance. Again, inspect the spectrogram and make sure a track does not take a false route through a nasal formant.
2. Getting started
- Start by creating a spectrogram, and showing formant tracks.
- Load your speech signal into the Objects window as a Sound object, select it and click View&Edit.
- The Sound editor opens, still showing whatever analyses had been used in the previous session.
- Open the View menu and select Show analyses. Then tick Show spectrogram and Show formants, and untick everything else:
- Note also the setting Longest analysis. Your selected analyses will be displayed only when you are viewing less than this length of signal in the Sound editor. If you happen to be viewing more, the analyses will not be done. Zoom in as necessary to a shorter portion of the signal, or increase the time set here at Longest analysis.
- Click OK.
- The spectrogram appears, with the red formant tracks across it.
- Note how the formant tracks appear to be best through vowels, and eratic through consonants.
- Note also any formant track errors here and there during the vowels (missing plots, unmotivated jumps etc). These are due to the formant tracking parameters not being tuned yet.
- Next, adjust the formant tracking parameter settings.
- Open the Formant menu and select Formant settings:
- Note that there are two menu items, Formant settings and Advanced formant settings. One is for parameters that might need altering more frequently, the other is for settings you will only need to alter very occasionally.
- The following Formant settings dialog box appears:
- The various parameters are explained in the next section as they arise. For now, start with the default settings (Standards).
- When all is ready, the formant tracks appear as red lines overlaid on the spectrogram. The male example is shown here, formant tracking for female speakers is dealt with later:
- Inspect the tracks and compare them with the formants on the spectrogram. The frequencies of the formant tracks might deviate somewhat from the formants seen on the spectrogram, but the tracks and the formants should not shift in different directions. Ignore the formant tracks in consonants, but check that the required number of formants has been found in each vowel (the default five formants in this example).
- Look for F2 in any back vowels. In this example the last vowel is back, and F2 is complete in the formant track.
- But F4 and F5 are not tracked properly in parts of some vowels in this example.
- The results of the tracking procedure can usually be improved by tuning the parameter settings to the speaker. This will be explained in the following sections. Once you have found the best settings you should keep them for all formant tracking for that speaker.
- The critical parameters are Maximum formant frequency and Number of formants. Taking them in the reverse order, these are the number of formants you want to see, and the expected maximum frequency of the highest of those formants. The values of these parameters are intimately related and highly person-dependent, reflecting the individual vocal tract length.
3.A. Number of formants to be reported
- The number of formants to be reported is set at Number of formants. The default setting is 5 formants, exemplified above.
- The number of formants should match the frequency of Maximum formant for each speaker, see the next section.
- It is good practice to analyse a few more formants than you actually need, in order to get better tracking of the ones you really want. For example, track five formants if you want F1 and F2, or F1 to F3.
3.B. Frequency range
- It is necessary to restrict the frequency range where Praat will look for the set number of formants, to avoid searching for too many formants or too few formants in the stated range. A rough guideline for adult male speakers is one formant in each 1000Hz band, so start by multiplying the number of formants by 1000Hz, e.g. 5000Hz for five formants. For adult female speakers expect one formant in each 1100Hz band. This is then set at Maximum frequency. The default Maximum formant setting, 5500Hz, is intended for adult female speakers. If Maximum frequency is set too low, one or more formants will be excluded and not tracked. If it is set too high, more formants than required will be present during the search, resulting in possible ambiguity when formants are identified. Maximum frequency reflects vocal tract length, the higher formant frequencies of women and children being due to proportionately shorter vocal tracts. Different individuals can consequently deviate from the suggested settings, and it is worthwhile to tune this parameter to the speaker. A setting of 5100-5300Hz turned out to be best for the male example shown above.
- It is also a good idea to set the frequency range of the background spectrogram to a little higher than Maximum formant, to make sure you can see all the formants that are to be analysed, e.g. spectrogram to 6000Hz when Maximum formant is 5000-5500Hz. Remember that the frequencies you want to look at must be present in the signal, which is determined by the sampling frequency when you digitised the signal.
- If a different number of formants is wanted, the setting of Maximum formant must also be adjusted to match. For example, if 3 formants are set for a male speaker, Maximum formant will be around 3000Hz. Starting with 3000Hz, Maximum formant was gradually raised until 3200Hz produced all three formants across all vowels in this example:
- However, the F3 track in the final vowel deviated noticeably from the F3 on the spectrogram (see the green arrow above). The fewer formants you ask Praat to search for, the trickier becomes the task of finding them. This example demonstrates the wisdom of asking for a few more formants than you actually need, to ensure successful tracking of the formants you really want to see. The following version is the result of asking for four formants below 4100Hz. The arrow (below) now shows that the F3 track at the end of the last vowel follows F3 on the spectrogram more closely:
3.C. Analysis window length
- The Analysis window length for formant tracking specifies how much signal is used for the calculations at each step. The default value is 0.025s, and this will rarely, if ever, need to be changed.
- The analysis window for formant tracking is set, and functions, independently of the analysis window already set for the spectrogram. It does not help you optimise the formant track or tune the analysis to the speaker. Shorter window sizes give eratic formant track contours. Lengthening the window smooths the contours and ultimately, with very large values, flattens them. The default window is a compromise between these extremes.
- A point to remember is that the window cannot function properly at the beginning and end of the signal, and there will be no formant tracks done there. It is a good idea to have silence at the beginning and end so that all the speech gets analysed. If you are working on a long signal, the spectrogram and formant tracks are already limited by the Longest analysis setting in View>Show analyses.
4.A. Dot size
- This concerns the red dots of the formant tracks. You might find it convenient to reduce Dot size to 0.5mm or smaller, to avoid concealing too much of the spectrogram behind the dots of the formant track. Or, use a larger dot for a large Sound editor window, and a smaller dot for a smaller Sound editor window. This does not affect the analysis, but the result can be easier to look at:
4B. Time steps
- The formant tracks are calculated and plotted at regular intervals through the visible part of the signal in the Sound editor. The size of the interval does not affect the quality of the analysis, but it does affect the appearance of the track contours (dots or line). It also affects the temporal resolution of the track analysis – shorter steps give more formant computations that give better definition of the track contours.
- The size of the interval is specified by the setting of the Time step parameter. This is completely independent of the spectrogram Time step at Spectrum>Advanced spectrogram settings.
- The formant tracking Time step method is selected from the View menu:
- The Time step settings dialog opens:
- These Time step settings are shared with the Pitch and Intensity analyses.
- Press the Time step strategy button and a dropdown menu will open offering three settings:
- Automatic: The default (and recommended) setting, which derives the step from Analysis window length. With the default 25ms window, the automatic time step is 6.25ms.
- Fixed: You specify your own step size in the space provided; a reasonable setting comparable to the automatic setting is 5ms. The default setting is 10ms.
- View dependent: There is a constant number of steps across the Sound editor window, that you specify in the space provided. This step size depends on the duration of the portion of signal on view; the more you zoom in, the shorter the time step will become. The default setting is 100 Time steps.
4.C. Dynamic range
- In the Formant menu, this parameter determines which parts of the tracks you show or hide by specifying a limit for weak spectral peaks, measured in dB down from the strongest peak.
- The default setting for formant tracking is 30dB.
- If a track is unexpectedly missing, try increasing this range slightly.
- If Praat has difficulty assigning a formant to too many peaks, try reducing this limit.
- The Dynamic range was reduced to 20db for the next example:
5. Reading off formant frequencies
- Once the parameters are optimized you can read off track values to get the formant frequencies at some instant of interest. For example, make a selection around some vowel that interests you or position the cursor there (do not include consonants in the selection). Then open the Formant menu and select Formant listing:
- Here is a signal selection during the final vowel of the male speaker example, and the corresponding Formant listing showing the track formant frequencies at each time step within the selection:
- And here is an example of the Formant listing for a cursor location in the centre of the selection from the previous example:
6. Formant tracks for female speakers
- The principles for parameter tuning apply equally to female voices, demonstrated by the following exemples.
- The main spectral differences between male and female speech are that women tend to have roughly 10% higher formant frequencies (due to their generally shorter vocal tracts), while the higher voice fundamental frequency means there is a greater interval between voice harmonics and consequently weaker definition of formants, making them harder to find.
- Since the formant frequencies of female speakers are higher, a Maximum formant setting of 5500Hz is recommended for the default 5 formants to be tracked (and is the default setting for Maximum formant).
- To make sure there is sufficient viewing space to see these higher frequencies, go to the Spectrogram settings and set the upper limit of View range to 6000Hz or 7000Hz. A setting of 7500Hz was necessary for this example.
- With all default settings, including the recommended Maximum Formant of 5500Hz for a female speaker:
- These settings did not find a fifth formant anywhere in any vowel. Also, in the back vowel (arrowed), the F2 track is eratic.
- The setting of Maximum Formant was raised, until finally at a setting of 7000Hz:
- The fifth formant is now tracked, but not everywhere, and sometimes at the expense of lower formant tracks. The F2 track of the back vowel (arrowed) has now disappeared.
- There is little more that can be done. There is just one more trick in the bag, and that is to raise Maximum Formant even higher and track correspondingly more formants, hoping that that will at least improve the definition of the five formants being sought. Maximum Formant was raised to 8000Hz and 7 formants were to be tracked in the following example:
- There are now five formant tracks in all the vowels, and the F2 track in the back vowel (arrowed) is showing again.
- This demonstrates once again that you are most likely to get better results by analysing a larger range and more formants than you actually intend to look at.