Next: System capability profile versus Up: System design Previous: System design

Introduction

In this chapter we do not discuss the available technical approaches nor argue the merits of each. We simply indicate what could be the needs and expectations of an application developer, should the solution be a spoken language or voice processing system.

For speech technologies, as for many technologies, each solution is a unique one that involves several types of expertise. A technology provider may come up with an integrated system (speech recogniser and/or synthesiser with ergonomics, oral dialogue, communication, software and hardware experts) that outperform others, although the pure assessment of single modules may show equivalent performance.

As a first recommendation we suggest that application developers request information from several technology providers to be sure that they are aware of the major international assessment methodologies and standards when available and applicable. Afterwards the application developer should request detailed proposals for the specific application to be set up. For the technology evaluation it is of paramount importance to account for the application characteristics as it is very hard to measure the quality of a speech recogniser or the comprehensibility of speech synthesis output in an absolute manner.

Spoken language systems are an appropriate combination of several modules including recognition of speech input, recognition of speaker identity (verification or identification) , speech output generation and synthesis (including speech coding), and/or man-machine interaction management.

A simple use of a spoken language system consists of recognising speaker utterances, interpreting them with respect to the application, deriving a meaning (or a command), and providing consequent feedback to the user (maybe a speech prompt or a system action). This is illustrated in Figure 2.1 for a speech input/output dialogue system.

Figure 2.1: Spoken dialogue system

In order to generalise the use of such systems in different man-machine interaction contexts, a predictive model of performance [Choukri et al. (1988)] needs to be obtained as a function of different identified relevant factors. The definition of those factors has to lead to a set of parameters that can describe a speech processing system. This description has to express two opposed points of view: that of the technology provider (designer) and that of the application developer (buyer). The two points of view have to be distinguished.

Designers should give proofs of the performance of their systems with a measure of the impact of any change. So the first contribution of this chapter is related to the technology supplier point of view, and aims at providing detailed guidelines for the specification of speech processing systems in order to explicate the operational capabilities offered by the technology. This will allow the technology providers to depict the system performance in a comprehensive way to the application developers.

Buyers need comprehensive information about how each system or device will perform in the specific conditions of their application. So the second contribution of this chapter is related to the application developer's point of view, which aims at providing detailed guidelines on how to express the requirement of applications that incorporate speech processing systems in order to make explicit the application requirements that should be met by the operational capabilities of the technology. This will allow the application developers to express their needs in a comprehensive way to the technology providers.

The technology specification is complex enough, and has to go beyond the sole numerical value of 99% accuracy usually announced by the equipment suppliers. This rate depends on numerous parameters. Some of them cannot be easily quantified [Pallett (1985), Choukri et al. (1988), Moore (1988)]. In order to focus on the most relevant parameters one needs to adopt a multi-dimensional characterisation of the speech processing system. This characterisation will be called the ``system capability profile '' (an expression first introduced by Moore).

The application requirement is also a complex phenomenon, too complex to be reflected only by a transaction success rate, and should also be depicted as a multi-dimensional characterisation. This will be referred to as the ``application requirement profile ''.

The objective of this chapter is to list the major factors that would permit definition of the above mentioned multi-dimensional space and moreover a way to express a matching process between the two spaces. It consists of forms with keyword entries that relate to the different dimensions as seen from the points of view of the technology provider as capabilities and system features, and of the application developer as requirements. For each module we will provide guidance for the general terminology and specifications, and elaborate algorithmic aspects, software and hardware implementations, system integration and other features.

Next: System capability profile versus Up: System design Previous: System design

EAGLES SWLG SoftEdition, May 1997. Get the book...