Experimental speech research has traditionally been focussed on factorial
experiments, that is, experiments in which a
number of factors are defined that are hypothesised to influence some aspects
of speech behaviour, in production or in perception (see Chapter 9). The amount of speech
in these experiments has typically been small, if only because it was
practically impossible to record large amounts of speech in production
experiments or to generate large amounts for perception experiments. The major
causes of the limitations were in the tight control of the speech needed for
well designed factorial experiments and in the
time required from the subjects. Tight control is necessary to prevent the
outcome of factorial experiments from being
meaningless: this type of experiment requires that all conceivable factors
different from the small number under study be kept constant, whereas the
experimental factors are varied over a limited range. It is not our
intention to criticise factorial experiments, if
only because they have contributed to virtually all the knowledge we have
about speech and because until recently there was hardly an alternative. But
it must be acknowledged that, precisely because of the tight control, the speech
used in the older experiments may not have been exactly ``communicative''. In
the majority of the cases the subjects performed in situations which are quite
remote from normal communicative behaviour; therefore, some caution should be
exercised in generalising the results of controlled
experiments to ``normal communicative'' speech.
Another reason to be careful in interpreting results of factorial
experiments is the possibility that the
experimenter did not completely succeed in keeping all non-experimental
factors constant: it may be the case that non-experimental factors did co-vary
with experimental ones, thereby being responsible for at least part of the
effects attributed to the experimental factor(s). One case in point is
intonation research, that has been pretty much focussed on
pitch and on duration effects. There is, however,
increasing evidence that other factors like spectral structure,
spectral slope, spectral
dynamics, etc. also play a role, and perhaps one that
is quite important. In short: there is a danger that factorial
experiments lead to overestimating the impact of
the factors under investigation, at the cost of factors that were supposed to
be constant, but that actually co-varied so as to enforce the effects of the
experimental factors.
Now that very large corpora are becoming available, it is possible to set up another
type of experiment, in which the behaviour of one or more specific factors is
investigated in a very large, perhaps comprehensive number of different
contexts. Instead of trying to neutralise the effect of concomitant factors by
trying to keep them constant (which will normally mean that one of the many
different levels of such factors is selected, e.g. a voiceless
stop as the right neighbour of the phonemes under
study, or only syllables which have a prominence lending
High-Low pitch contour), one may try instead to sample many
different contexts. Of course, in order to make this type of research
feasible, one has to assume that subject effects can be treated in exactly the
same way as context effects, because it will still be extremely difficult to
have subjects perform for very long periods of time. In designing corpus based
experiments one must be aware of the extreme skewing of many frequency
distributions observed in spoken language. For
instance, in all languages for which data on phoneme
frequencies are available it has appeared that within a system some
phonemes occur much more often than other
phonemes . Random sampling would leave
one with a very high likelihood of missing infrequent phonemes
and of missing possible contexts, unless the total corpus is made excessively
large. Greedy algorithms [Van Santen (1992)] can be
used to find the minimum amount of linguistic material that covers a maximum
number of phenomena, but even with the use of greedy
algorithms it cannot be guaranteed that all possibly relevant conditions are
indeed covered: conditions which are not formulated as targets for the
search will only be present by chance. Since complete coverage is not practically
attainable, corpus research must deal with missing data in one way or another.
Attempts have been made to handle missing data by means of knowledge-based
arithmetic models, including all relevant parameters; alternatively, ``blind''
statistical modelling techniques like CART (Classification And Regression Trees) can be used. There seems to be some
preference for arithmetic models, unless one can guarantee that the missing
data are not concentrated in a few subspaces [Van Santen (1994)].