Speech waveforms

This page is not complete yet

Praat for beginners:

Tutorial: Speech waveforms

  1. Preliminaries
  2. Periodicity
  3. Annotating the waveform
  4. Recognizing features
    1. Example 1: on our website
    2. Example 2: not got room for
    3. Example 3: material
    4. Example 4: publish
    5. Example 5: reports
    6. Example 6: news items
  5. Segmenting the waveform

Top

1. Preliminaries

  • If you’re not sure what waveforms show, you should read Understanding waveforms  and Understanding standard waveforms first.
  • The examples are taken from FM radio broadcasts (British English).
  • It’s not possible to read a speech waveform and work out exactly what has been said.
  • But you can identify characteristic features like voicing, occlusions, stop bursts and fricative hissing.
  • The usual working situation is that you already know what has been said and you need to recognize detail that will identify the places in the speech wave that are relevant for the phonemes of the spoken sequence.
  • You will have your recording on view in the Sound editor, where you will also be able to zoom in and out to inspect local detail, and listen to short selections.
  • Remember that a phoneme is an abstract linguistic entity. If it has a physical composition, it’s the neural activity storing the knowledge necessary for its production. For example, before a British English /l/ has emerged from someone’s lips, it will have been narrowed down to a bright or dark variant (allophone) depending on the actual context; speakers of some dialects might additionally vocalize it (make it [w] or [u-o]-like)  in some contexts; finally, it’s component articulator gestures will be woven into those of the neighbouring phonemes. This all takes place in a unique context of situation and style for that moment. What you then see at any given point in a speech waveform is one instance of a phoneme, that is similar to and yet different from all other instances of that phoneme.

Top

2. Periodicity

  • This section is a reminder of the similar section in the Understanding waveforms tutorial, but illustrated with speech examples.
  • Periodic waves repeat some portion over and over again. In speech, this reflects the vibrations of the vocal folds during voicing. Aperiodic waves are random rather than repetitive, in speech reflecting the turbulent air movement of the hissing of fricative consonants or aspiration of stops.
  • A first glance along the waveform will immediately spot the periodic and aperiodic sequences, showing you already where to expect vowels, sonorant consonants, and unvoiced fricatives. Brief moments of silence might indicate occlusions of unvoiced stops or affricates, but they might also be pauses between phrases.
  • While looking at the examples on this page, assure yourself that there are no gaps between words in natural speech. Any gaps you see are either occlusions or pauses.
  • This first example includes an unvoiced stop and a voiced stop
 
The waveform of the utterance on our website, showing periodic and aperiodic activity.
 
The waveform of the utterance not got room for,
with more periodic and aperiodic activity.
  • The next example shows, first, three cycles from a vowel after zooming in, followed by the aperiodic hissing of [s], both at the same zoom scale and for the same duration
  • Note that the three periodic cycles are superficially the same, but not exactly equal. This is because there is usually some movement somewhere in the vocal tract so that the speech wave is changing all the time. This is as good as it gets. A purist would say this is quasiperiodic rather than periodic.
 
 

Top

3. Annotating the waveform

  • This is a matter of both purpose and taste. You might want to record orthography, phonemes, allophones, phones or even features.
  • In the following example, the text of the utterance is laid out above the wave in regular spelling, and below it in narrow phonetic transcription.
  • The narrow transcription reflects the finer detail of how this instance of speech was actually pronounced on this occasion.
  • The locations of the letters of the orthographic version, or the characters of the phonetic version, show, approximately, the parts of the speech wave that are most closely related to each phone.
  • To understand what that means, recall how speech articulation is organized:
    • Speech is not organized like writing, as a string of discrete entities.
    • The various articulator movements needed for adjacent phonemes are woven together simultaneously (coarticulation) so that at any moment there is activity for two or more phonemes.
    • For example, this speaker would have lowered his velum and raised his tongue blade for the /n/ of on while he was still articulating /ɔ/, then he would spread the rounded lips of /ɔ/ during the alveolar occlusion of /n/, then he would withdraw the lowered velum and elevated tongue blade of /n/ while he was articulating /ɑ:/.
    • Only rarely, and briefly at that, do you find a moment with activity for uniquely one phoneme.
  • You can read more about coarticulation here.


Top

4. Recognizing features

Example 1:

 
  • The text of the utterance is laid out above the wave in regular spelling, and below it in narrow transcription.
  • The sequence on our we.. is periodic right through. The consonant [n] starts and ends with well-defined changes in the waveform. In contrast, there are no clear boundaries for [w] . This is a warning that setting up boundaries in the speech waveform is an exceedingly arbitrary process.
  • The vibrations at (1) are voicing during the occlusion of [b], while the transient at (2) is the burst. The vibrations at (1) become progressively weaker until they disappear. This is due to the rising air pressure in the vocal tract during the voiced occlusion, which means that the pressure drop across the glottis gets gradually smaller, and in consequence the airflow through the glottis diminishes and the glottal vibrations are weakened.
  • In contrast, the vibrations at (3) do not belong to [t], which is voiceless. The [t] occlusion is confirmed by the silence at (4), and the transient and aperiodic hissing at (5) are the burst and aspiration respectively.
  • Listen to this example:


Top

Example 2:

 
  • This example shows two examples of the glottal stop allophone of /t/, at (1) and (2), occurring in this accent when /t/ precedes some other consonant. Note the glottal vibrations, with a substantial amplitude at (1) and a small amplitude at (2). The glottal stop is not defined for voice. A complete glottal occlusion would stop all airflow, but a weaker glottal constriction, with vocal folds not too stiff, would allow some vibration.
  • The unvoiced hissing at (3) belongs to [f].
  • Listen to this example:


Top

Example 3:

 
  • Once again, a periodic sequence of three vowels and two sonorant consonants, ..erial. And once again no indication of any clear boundaries within that sequence.
  • The silent occlusion of [t] at (1), and the burst and massive aspiration at (2).
  • The expected final [ɫ] is vocalized, a typical feature of this accent.
  • Listen to this example:


Top

Example 4:

 
  • The voiceless occlusion of [p] at (1) is clearly defined. This is followed by unexpected violent low frequency vibrations during the aspiration at (2), possibly due to interaction with the microphone, possibly not. Pop filters or windshields can be fitted on microphones in order to dampen unwanted aperiodic activity.
  • Voicing during the occlusion of [b] at (3) is followed by the burst at (4).
  • There is aperiodic hissing at (5) for the final fricative [ʃ].
  • Listen to this example:


Top

Example 5:

 
  • The silent occlusion of unvoiced [p] at (1) is followed by the burst and aspiration at (2), again exhibiting violent low frequency vibrations (like example 4).
  • There is another glottal allophone of /t/ at (3), with weak glottal vibration.
  • Aperiodic hissing for [s] at (4).
  • Listen to this example:


Top

Example 6:

 
  • Voicing and hissing combined at (1) for [z] (the aperiodic component was only visible after zooming in and can’t be seen here).
  • The silent occlusion of [t] at (2) is followed by the burst and aspiration at (3).
  • Aperiodic hissing of [z] at (4) combined with voicing (again, this was not obvious until after zooming in and is not visible here). Even in this example there are a few violent low frequency vibrations, at (5).
  • Listen to this example:


Top

5. Segmenting the waveform

xxx

xxx

This page is not complete yet

©Sidney Wood and SWPhonetics, 1994-2012