next up previous contents index
Next: Speaker dependency Up: System design Previous: System in operation versus

Speech recognition systems


The expression ``speech recognition system'' is meant here as the module that deals with speech input (acoustic wave forms) to deliver either a label or a corresponding command. Usually the designers focus on three major elements, which are the vocabulary  (complexity, syntax , size ), the environment  (bandwidth , noise  level, distortion type ), speakers (stressed/relaxed, trained/untrained). 

The major requirements relate to:

In the following we will describe most of the factors that may occur in the specification process of a speech recognition technology or the expectations of the users and thus of application developers.

A speech recogniser  is based on some speech modelling using various paradigms. The best known are Dynamic Time Warping (DTW),  Hidden Markov Modelling (HMM),  and Artificial Neural Networks (ANNs).    Most of the approaches distinguish two phases: A training phase  and an exploitation phase . The first phase is devoted to learning speech characteristics from data:

The material needed for this phase is important and will be elaborated upon in the following sections. The second phase, related to exploitation, consists of the use of the trained system to recognise speech input. The key characteristics of the recogniser  are described within the following sections.

EAGLES SWLG SoftEdition, May 1997. Get the book...