In a sense there is only one ultimate criterion that determines the quality of a speech output system, viz. its overall quality within a given application. Judgment tests usually include one or more rating scales covering such global aspects as ``overall quality'' , ``naturalness '' and ``acceptability''. A functional approach to global assessment would be to determine whether users of speech output, when given the choice, choose to work with a machine or with the human original the machine is intended to simulate. Or one may determine if the information exchange is as successful in machine-to-human as it is in human-to-human situations.
On the other hand, one may be interested in determining the quality of specific aspects of a speech output system, in an analytic listening mode, where listeners are requested to pay particular attention to selected aspects of the speech output. Again, both judgment and functional tests can and have been designed addressing the quality of specific aspects of a speech output system. Listeners may be asked, for instance, to rate the clarity of vowels and consonants, the appropriateness of stresses and accents , pleasantness of voice quality, and tempo. Functional tests have been designed to test the intelligibility of individual sounds (phoneme monitoring ), of combinations of sounds (syllable monitoring ), of whole words (word monitoring ) in isolation as well as in various types of context [Nusbaum et al. (1986), Ralston et al. (1991), e.g.,].