next up previous contents index
Next: Laboratory vs. field Up: Towards a taxonomy of Previous: Towards a taxonomy of

 

Glass box vs. black box

   

Text-to-speech systems  generally comprise a range of modules that take care of specific tasks. The first module (or complex of modules) converts an orthographic input string to some abstract linguistic code that is explicit in its representation of sounds and prosodic  markers. Various modules then act upon this symbolic representation. Typically, one module concatenates the primitive building blocks (phonemes , diphones ) in their appropriate order, another implements what coarticulation  is needed to obtain smooth human-like transitions between successive building blocks. Prosodic  modules, taking the positions of word stresses , sentence accents , phrasal and sentence boundaries into account, are then called upon in order to provide an appropriate temporal organisation (local accelerations and decelerations, pauses) and speech melody.

End users will typically be interested in the performance of a system as a whole. They will consider the system as a black box  that accepts text and outputs speech, a monolith without any internal structure. For them it is only the quality of the output speech that matters. In this way systems developed by different manufacturers can be compared or the improvement of one system relative to an earlier version can be traced over time (comparative testing ). However, if the output is less than optimal it will not be possible to pinpoint the exact module or modules that caused the problem. For diagnostic  purposes, therefore, designers often set up (glass box . evaluations with experimental character. Keeping the effects of all modules but one constant, and systematically varying the characteristics of the free module, any difference in the assessment of the system's output can be attributed to the variations in the target module. Glass box testing , of course, presupposes that the researcher has control over the input and output of each individual module.

Recommendations on choice of test methodology

  1. Use a glass box  approach if you want diagnostics in order to improve your speech output system.
  2. Use a black box  approach if you want to assess the overall performance of speech output systems.

The dichotomy between glass box  and black box testing  is basic to speech output testing , which has led some researchers to propose a strict terminological division whereby ``evaluation'' signifies glass box testing  (or: diagnostic evaluation ) only, and ``assessment'' is reserved exclusively for black box testing  (or: performance evaluation). In this chapter we will use the terms, ``testing'', ``evaluation'' and ``assessment'' indiscriminately, and use disambiguating  adjectives whenever there is a risk of confusion.    



next up previous contents index
Next: Laboratory vs. field Up: Towards a taxonomy of Previous: Towards a taxonomy of

EAGLES SWLG SoftEdition, May 1997. Get the book...