Definition, segments and domain
Typical definitions of coarticulation are that articulators are moving simultaneously but for different phonemes, or that phonemes overlap in time, which explicitly implicate a belief in some sort of underlying “segment” that has its physical expression in articulatory behaviour. Indeed, Liberman & Mattingly 1985 insist that some sort of discrete representation is always implied, even for those who would deny it. The classical arguments in favour of discrete underlying segments in this context have been summarized by e.g. Pisoni & Luce 1987 and Löfqvist 1990. Although many authors seem to prefer not to commit themselves nowadays on what such “segments” might be, an abstract definition of the phoneme is adequate for the moment. Coarticulation as such has been an object of study ever since Menzerath & Lacerda’s 1933 pioneer investigation of lip movement and nasal airflow, but the phenomenon had been noticed much earlier by Sweet 1877:56, 60-63, who recognized that speech sounds were momentary points “in a stream of incessant change” consisting of inevitable simultaneous transitional on and offglides between them. This remained the accepted paradigm until Joos 1948:104-108 reported different spectra for a vowel phoneme in different consonant environments, which was interpreted as evidence of coarticulation extending beyond the transitions. The Kozhevnikov & Chistovich 1965 syllable model offered an explanation for simultaneous expression of serially ordered phonemes, although subsequent reports e.g. Öhman 1966 appeared to contradict this since the domain of coarticulation was seen to extend into neighbouring syllables and indeed some investigators reported domains of several syllables.
Incompatible legacies of different scientific traditions have led to controversies concerning the domain of coarticulation, the relation of coarticulation to assimilation, and the nature of coarticulation itself, all epitomized in the debate between Hammarberg 1976, 1982 and Fowler 1980, 1983. Overviews of current work on coarticulation and related theoretical topics have also been given by Daniloff & Hammarberg 1973, Kent & Minifie 1977, Kent 1983 and Lindblom 1986. Models proposed for coarticulation have tended to fall into two main classes depending on whether their driving principle is coproduction or feature-spreading. But opinions also differ as to whether coarticulation is intentional and preplanned input to the speech motor system or instead the physiological consequences of subcortical control constraints and mechanical properties of the articulators themselves. Opinions differ further as to how knowledge and memory access is handled. “Look ahead” versions of models must have access to at least a major portion of the current syntagm, while subcortical models are restricted to whatever information is initially passed down about the current segment. Coarticulation research has typically been concerned with topics like how far ahead a phoneme may be initiated, how long it may be kept going, what and where its boundaries are, and in what sense simultaneous phonemes are serially ordered. All this implies that articulatory and acoustic attributes can be singled out, delimited, identified and assigned to their respective phonemes.
Investigations of coarticulation have typically comprised just one or two articulators (frequently the lips, mandible, tongue blade or velum), exploiting and depending on the technology currently available, such as e.m.g., movement transduction, optical tracking, dynamic palatography, fibrescopy, cinematography, x-ray motion filming (automatic pellet-tracking or manually traced pictures as here), or by interpreting acoustic features of the speech wave. Very rarely, if at all, has work been reported on the dynamic coordination of all gestures throughout the supralaryngeal vocal tract.
A growing area of interest is the study of the gestures themselves and of their place in phonological theory (e.g. Browman & Goldstein 1989 and Boyce et al. 1990) and in speech perception (e.g. Fowler 1986, Liberman & Mattingly 1985, Stevens & Blumstein 1981).
Cortical, subcortical, preplanned or accidental?
The coproduction approach usually sees coarticulation as a low level phenomenon, the inevitable physiological consequence of e.g. the intrinsic timing requirements of the gestures involved due to constraints of the vocal tract (Fowler 1980). In contrast, Liberman et al. 1967 implied high level control when they emphasized the necessity for restructuring phonemes to overcome the inability of the ear to resolve discrete elements arriving at the rates of phoneme flow customary in speech, or of the articulators to produce distinct gestures at such rates. They suggested that “dividing the load among the articulators allows each to operate at a reasonable pace, and tightening the code keeps the information rate high. It is this kind of parallel processing that makes it possible to get high speed performance with low speed machinery…”. If such restructuring of articulation is indeed part of the encoding process, as they believe, then it should be under close high level control, i.e. a preplanned and integral part of the programming.
Coproduction models generally emphasize the simultaneous articulation of especially vowels and consonants, but individual instances of these models can be mutually incompatible. For example, Kozhevnikov & Chistovich 1965, chapt. 4, posited that all the several manoeuvres of the (open) syllable can be initiated at once provided there is no antagonism between them (in which case some gestures must be delayed, requiring some measure of preplanning), whereas Ohman 1966 maintained that coarticulation is precisely the result of the summation of (sometimes) antagonistic consonant features superimposed on a continuous diphthongal vowel to vowel movement (i.e. unplanned and accidental).