Praat for Beginners:
Making spectrograms in the Sound editor
- Improving the appearance of the spectrogram
- Wideband and narrowband spectrograms
- Frequency range
- Printing and saving the spectrogram
- If you are not sure what spectrograms are or what they show, you should read these sections first:
- LUnderstanding spectrogramsL
- What are formants.
- See also LQuick guide to spectral analysisL
- There is an alternative route to making spectrograms from the Objects window, but this one in the Sound editor is more straightforward and easier to start with.
- Praat makes spectrograms by analysing the spectrum of the speech waveform at brief but regular time intervals, or time steps, along the speech signal.
- The spectrograms are small here to make them fit this web page. When you are working with spectrograms you will want to see them larger on your screen in order to see all the detail. A standard paper spectrogram from a speech spectrograph was about 30×10 cms and spanned about 2.3 seconds.
- Spectral analysis of voiced sound in speech (especially vowels and sonorants) is sensitive to fundamental frequency (the harmonics of higher pitched voices are farther apart which means there are fewer harmonics in a spectral peak, leaving the peaks less well defined). Consequently, the original sound spectrographs with fixed filter settings were notoriously unsuccessful at producing good spectrograms of female and child voices. The digital procedures used today are flexible and the analysis can be more successfully tuned to the speaker’s voice. To illustrate this, both male and female speech samples are shown below.
- The analysis method is Fast Fourier Transform (FFT) that calculates the spectrum of the sound emerging from the lips. The advantage of FFT is easier setup, the disadavantage is the increasing difficulty of identifying formants for speakers with higher pitched voices.
- There is an alternative method, linear prediction (LPC), that is not affected by voice pitch and is therefore more successful at finding the formants when voice pitch is higher. LPC is used in Praat for formant tracking, but do make sure you understand FFT spectrograms before you tackle that.
1A. Speech examples used for for illustrations
- A male and a female speaker are used to illustrate the creation of spectrograms in the Sound editor. Their utterances offer a good selection of vowels, sonorants, stops and fricatives that demonstrate the capabilities of spectrograms in Praat. The transcriptions you see were added later using an image editor, they are not available in the Sound editor.
- The first is a Swedish adult male speaker saying
finns det dokumentära inslag (“there are documentary items”):
- The second is a Swedish adult female speaker saying
ett forskningsprojekt (“a research project”):
1B. Getting started
- First, load your signal into the Objects window as a Sound object (make a recording or open a sound file), select it, and click View&Edit. The Sound editor opens. You will see the signal waveform, and possibly any analysis left from the previous session (the Sound editor analysis settings are saved from session to session).
- If the spectrogram is not already visible in the Sound editor, open the Spectrum menu and tick Show spectrogram:
- Turn any other analyses off if they happen to be visible. There is a long and a short way to do this. The long way is to open each relevant analysis menu in turn, and untick it (Pitch, Intensity, Formant, Pulses as the case may be). The short way is to open the View menu and select Show analyses.
- Then, in the dialog box that appears, untick the unwanted analyses (and in fact you could also have selected the spectrogram here too, and fixed all these selections and deselections in one go):
- Note also the setting Longest analysis. Your selected analyses will only be displayed when your are viewing less than this in the Sound editor. If you happen to be viewing more, the analyses will not be displayed. Zoom in as necessary to a shorter portion of the signal, or increase the time set here at Longest analysis.
- There is one more step, to select the analysis settings you want for your spectrogram. Open the Spectrum menu and select Spectrogram settings:
- Note that there are two setting items, Spectrogram settings and Advanced spectrogram settings. One is for analysis parameters that might need altering more frequently, the other is for settings you will only need to alter very occasionally. The following Spectrogram settings dialog box appears:
- This overview of spectrograms will start by resetting to the default settings (Standards), but for your everyday work you would set any analysis parameters you actually needed (these parameters will be explained later as they arise). The default settings will give you a spectrogram looking something like these male and female examples:
2. Improving the appearance of the spectrogram
- There are one or two things you can do to optimise the appearance of the spectrogram, so that you can extract the maximum of information from it when you come to study it in detail:
- The resolution or definition (detail sharpness) of the spectrogram image
- The exclusion of background noise so that you only see speech detail.
2A. Image resolution
- First, a word about how your spectrogram is displayed as an image on your computer screen. The spectrogram image consists of numerous small dots, and the space in the Sound editor window also consists of numerous small dots. Both these sorts of dots are called pixels in the world of images. The pixels of the spectrogram image have to be fitted onto the pixels of the Sound editor window, which is done for you by Praat. But there are some decisions you make that affect the process and give you some control over the definition of this image.
- The first, image resolution, is related to how many spectrogram dots there are to be plotted in the available space. This is determined by the number of time and frequency increments at which the spectrogram is calculated, numbers that you can set yourself at Time steps and Frequency steps.
- The second, window resolution, is related to how much space (number of screen pixels) is available in the Sound editor for the spectrogram, and to the display resolution of your monitor (usually set when it was installed, typically 600×480 pixels or 800×600 pixels or 1024×760 pixels and so on). The size of the Sound editor window depends on how large you make it (by dragging the sides, or maximizing it).
2B. Temporal resolution: Time steps
- The default temporal resolution is 1000 time increments (or Time steps) spread across the Sound editor window. This is adequate for a maximized Sound editor on a screen that is about 1000 pixels wide. If you have a very wide screen you could try increasing the number of Time steps to match the display resolution of the monitor, say 1500 or 2000.
- You will find the Time steps setting in the Advanced spectrogram settings:
- To demonstrate how the Time step setting affects the appearance of the spectrogram, and how the appearance can be optimized, try this experiment: maximize the Sound editor window, and set Time steps to 10, 100, 250, 500 etc. in turn, and see what the spectrogram looks like each time. At first the spectrogram will look like a walnut tabletop, then it will look like a spectrogram but seen through a bathroom window, then it will look even better but you might notice that some glottal pulses are missing. As you continue towards 1000, or beyond if you have a wide screen, you will eventually see no further improvement in the definition of the spectrogram. You have reached the point where the pixels of the spectrogram image have at least one screen pixel each. Below that setting your spectrogram will not look its best, above that setting you will not see any improvement.
- Time steps also has another job to do. Your purpose in making a spectrogram is to obtain information about the speech sample, and you will need a certain analysis resolution in order to get that information, depending on what temporal precision you are looking for. Suppose a few minutes speech is on view with the default 1000 steps. Each step will then perhaps be about 200 or 300ms. This means there will be just one spectrum calculated per syllable or even per word and there will be no information at the phoneme level, let alone briefer events like stop bursts, aspiration, glottalization, diphthong dynamics etc. This is clearly so coarse that it would not give any worthwhile information at all. So the first decision is to display and analyse a reasonable amount of speech, around a couple of seconds. You might also like to be sure there is at least one spectrum calculated for every glottal pulse (e.g. at least every 5-10ms for men, or 3-5ms for women, or 2-3ms for children). The default 1000 Time steps gives precisely 2.5ms steps when there is 2.5s of signal.
- Putting all this together, the necessary minimum step setting is whichever is the larger of (1) the number of steps that gives the best display resolution, and (2) the number of steps for your desired analysis resolution. If your number is smaller than the default 1000, you can keep the default anyway.
- Nothing comes free. Spectrograms involve a very heavy work load for your computer, and increasing the number of time steps will give your computer even more work to do. If your computer resources are overtaxed, reduce the work load by closing any other programs that are open. Try shortening the visible portion of the signal (i.e. zoom in slightly).
2C. Frequency resolution: Frequency steps
- The same considerations just outlined above for temporal resolution also apply to frequency resolution. The definition of your spectrogram along the frequency axis depends on how much data there is to plot, determined by the number of Frequency steps in Spectrum/Advanced spectrogram settings. The default setting is 250 Frequency steps that are spread vertically over the spectrogram, giving an increment of 20Hz for the default frequency range of 0-5000Hz, or 40Hz if you alter the frequency range to 0-10,000Hz. You can try the same experiment, setting Frequency steps to 10, 100, 250 etc. in turn and observing the effect on the appearance of your spectrogram, thereby optimising the setting for your screen.
2D. Background noise: Dynamic range
- The default settings will sometimes give you noisy spectrograms, a grey cast concealing some of the speech detail. This can be improved by changing the Dynamic range in the Spectrogram settings dialog. This determines how much of the energy in the signal that is to be displayed in the spectrogram (measured in dB down from the level of the strongest energy peak, like measuring the Himalyan mountains downwards from the top of Everest). Larger dB settings go deeper into the background noise, smaller dB settings come above the background noise. The ideal is to exclude as much of the background noise as possible while still retaining weaker speech detail. Look for details like the weak hiss of [f, v, z] or [h]. An excellent signal, recorded with utmost care, will allow you to exclude all noise and see all and only speech detail. A carelessly recorded signal, with an undue noise level, will compromise weaker speech detail.
- Reduce the Dynamic range setting in 3db steps, which successively halves the amount of acoustic energy that you will see in the spectrogram. The background will become whiter at each step as more and more noise energy is hidden. Continue stepping like this until you start losing speech detail, then go back the last step.
- The Dynamic range was reduced to 47dB for the male example and 44dB for the female example example:
3. Wideband and narrowband spectrograms
- Traditionally, speech spectrograms are either wideband or narrowband, so called from the size of the electronic bandpass filter (330Hz or 45Hz respectively) that swept the frequency range on the original speech spectrographs. The practical significance is that wideband spectrograms show formant structure while narrowband spectrograms reveal the harmonic structure.
- When the voice fundamental frequency is high (some high-pitched male voices and all female voices), the wider filter has tendency to act like the narrower filter, resolving the voice harmonics rather than formants. An advantage of digital spectral analysis is that the frequency of the wide filter can be adjusted to improve performance with high pitched voices.
- The filter bandwidth is set at Window length (s) in the Spectrogram settings dialog box. The unit for this setting is seconds and not Hz, and is the duration or time constant of the filter rather than its bandwidth, representing the amount of signal data that the filter has to “see” (i.e. that has to pass through the filter) to give a result.
- For wideband spectrograms, a time constant of 4ms or 5ms will do (but remember it has to be entered in the settings box in seconds, i.e. 0.004 or 0.005). The examples given above are typical wideband spectrograms. The default setting is 0.005s, which is fine for most adult male voices. However, this setting might be on the large side for men with high pitched voices, and especially for women (whose voices can be pitched a half to a whole octave higher than men’s voices), and it might then tend to act like a narrowband filter and show individual harmonics at places where the voice frequency is relatively high. If that happens, experiment with slightly smaller time constants of 4ms, 3.5ms etc. The voice fundamental frequency of the female example rose to almost 300Hz in the final syllable, and the voice harmonics tended to intrude when the default setting was used. The time constant was taken down to 0.003s for the next spectrogram:
- For narrowband spectrograms, set Window length to 0.03s, which will be fine for both male and female voices. The two examples are repeated now at this narrowband setting:
4. Frequency range
- The default frequency range is 0-5kHz, as in the previous examples. This is adequate to show at least four or five formants and just get a glimpse of the lower edge of high frequency consonants like [s] or [t]. This requires a sampling rate of at least 10kHz when you digitize your signal. The frequency range is set at View range (Hz) in the Spectrogram settings dialog box, both the lower and upper limits.
- Setting the range to 0-8kHz or 0-10kHz will enable you to see all the spectral components present in a speech signal, usually up to 8kHz or beyond. This will require a signal sampling rate of at least 16kHz or 20kHz respectively when you digitise your signal. The two previous examples are repeated now as wideband spectrograms with the endpoints set at 0 and 10000 in View range (Hz).
5. Printing and saving spectrograms
- There is no need to keep every spectrogram you make in Praat, you can always create new ones from the sound recording. But there are occasions when you will want a printed copy (to put in your notes, or to use as a handout) or save as an image file (to paste into a written report, or to process in an image editor etc). Spectrograms are printed from the Picture window. This section demonstrates briefly how this is done, while the procedure is described in full detail LelsewhereL.
- The first step is to transfer the spectrogram from the Sound editor to the Picture window.
- First, mark out the area the spectrogram is to occupy in the Picture window. Position the mouse pointer where you want the top left corner, then hold the left mouse button down and drag the pointer to where you want the bottom right corner. Then release the mouse button. For this example, the default area already marked out is used.
- Then, open the Sound editor Spectrum menu and select Paint visible spectrogram. The Paint visible spectrogram dialog appears:
- For this simple example, only Garnish is selected, that puts a box round the spectrogram and scale information. Then click O.K. and the spectrogram appears in the selected area in the Picture window:
- The commands for printing the spectrogram, or saving it as an image file, are in the File menu.