next up previous contents index
Next: Why speech output assessment? Up: Introduction Previous: Introduction

 

What are speech output systems?

By a speech output system we mean some artifact, whether a dedicated machine or a computer programme that produces signals that are intended to be functionally equivalent to speech produced by humans. At the present time speech output systems generally produce audio signals only, but laboratory systems are being developed that supplement the audio signal with the visual image of the (artificial) talker's face [Benoît (1991), Benoît et al. (1992)].gif Audio-visual (or: bi-modal) speech output is more intelligible than audio-only output, especially when the audio channel  is of degraded quality. In this chapter we will not be concerned with bi- or multimodal speech output systems, and concentrate on audio-only output instead.

We exclude from the domain of speech output systems such devices as tape recorders and other, more advanced, systems that output speech on the basis of complete, pre-stored messages (``canned speech''  or ``copy synthesis''), irrespective of the type of coding or information compression used to save storage space. We crucially limit our definition to systems that allow the generation of novel messages, either from scratch (i.e. entirely by rule) or by recombining shorter pre-stored units. This definition also includes hybrid synthesis systems where individually stored words (e.g.\ digits) are substituted in information slots  in a carrier sentence   (e.g. in time-table consultation services).

It seems to us that two basic types of speech output systems have to be distinguished on the basis of their input, namely text-to-speech (TTS)  and concept-to-speech (CTS) . Other, more complex, systems combine characteristics of these two.



next up previous contents index
Next: Why speech output assessment? Up: Introduction Previous: Introduction

EAGLES SWLG SoftEdition, May 1997. Get the book...