Test procedures


As indicated in Section 12.2, speech output assessment techniques can be differentiated along a number of parameters, but no parameters related to the actual test procedure were included there. Test procedures  can vary with respect to subjects (see Section 12.3.1), stimuli, and response modality .

Stimuli can vary along a large number of parameters, the most important of which are listed below.

In Section 12.7, summary descriptions of tests are given where the stimuli have been categorised along these stimulus parameters. Chapter 9 on methodology should also be consulted.

Response modality  can vary along a number of parameters as well. The choice seems to be mainly determined by three factors: comparative versus diagnostic , functional   versus judgment , and TTS  development versus psycholinguistic  interest. In the five types of response modalities  listed below, 1 and 2 are mainly used within the glass box approach  (1 in TTS  development, 2 in psycholinguistically oriented research ), whereas 3, 4 and 5 are more common in the black box approach . The latter three response modalities  can be further differentiated in that 3 and 4 are functional in nature (3 in TTS  development, 4 in psycholinguistically oriented research ), whereas 5 represents judgment testing . In the list of response modalities  a distinction is made between off-line tests, where subjects are given some time to reflect before responding, and on-line tests, where an immediate response is expected from the subjects, tapping the perception process before it is finished.

The last response modality  will be discussed in some more detail. Pavlovic and co-workers have conducted an extensive series of studies [Pavlovic et al. (1990)] comparing different types of scaling methods that can be used in judgment tests  to evaluate speech output. Much attention was paid to:

Pavlovic et al. stress that there are important differences between the two types of scaling methods, for example the fact that categorical estimation   results in an interval scale, whereas magnitude estimation  results in a ratio-scale. The former leads to the use of raw ratings, the calculation of the arithmetic mean, and the comparison of conditions in terms of differences, the latter leads to the use of the logarithm of the ratings, the geometric mean, and comparison in terms of ratios. The differences also have implications for the type of conclusions to be drawn from the test results. Both the categorical estimation method  (with a 20-point scale) and the magnitude estimation method  have been included in SOAP  as standard SAM Overall Quality test   procedures (see Section 12.7.11).

Recommendations on choice of response modality


  1. For rapid judgment testing , use intra-subject (``internal comparison'') categorical estimation, , and when you do, use at least a 10-point scale.
  2. To compare results across tests (``external comparison''), use magnitude estimation   and when you do, use the line length drawing procedure, asking subjects to express the quality of the stimulus relative to the most ideal (human) speech they can imagine.


EAGLES SWLG SoftEdition, May 1997. Get the book...