Speech quality and conditions

Next: Capability profile versus requirement Up: Introduction Previous: Classification of recognition systems

Speech quality and conditions

The conditions under which a recogniser is usedgreatly influence on its performance. Speech quality can be characterised by various properties. There is a distinction between pre-production factors , which influence the way speech is produced, and post-production factors, which influence the way the speech is transmitted from the mouth of the speaker to the recognition system. We have summarised some of the conditions in Table 10.2.

Parameter easy task difficult task

Pre: Vocabulary choice distinct words similar words

Talking style read speech spontaneous speech

constant energy level fluctuating level

Recording conditions undisturbed speech deteriorated speech

(e.g. stressed,

Lombard effectLombard effect )

Post: Electrical characteristics wide bandwidth small bandwidth

good transmission unreliable channel quality

no noise noise

Table 10.2: Conditions of speech

**Table 10.2:** Conditions of speech
	Parameter	easy task	difficult task
Pre:	Vocabulary choice	distinct words	similar words
	Talking style	read speech	spontaneous speech
		constant energy level	fluctuating level
	Recording conditions	undisturbed speech	deteriorated speech
			(e.g. stressed,
			Lombard effectLombard effect )
Post:	Electrical characteristics	wide bandwidth	small bandwidth
		good transmission	unreliable channel quality
		no noise	noise

Vocabulary choice: Within the vocabulary, words can be chosen to be acoustically very distinct, or very similar. One would choose the former for an application (e.g. a set of control words), while for diagnostic purposes the latter serves very well (e.g. CVC-words , see Section 10.3.4).
Talking style: Firstly, a distinction is made between read speech and spontaneous speech . The former is somewhat unnatural, as there are only few circumstances in which speech approaches this quality, but it has been used in evaluation of speech recognition system s for a long time because it is relatively easy to define and reproduce . Spontaneous speech comes in a variety of flavours, but it generally consists of a much less well-defined grammar , and contains errors, corrections, mispronunciations, and stronger prosody . Secondly, the level of the speech can vary. When the level varies strongly within a short time frame (e.g. the distance between microphone and mouth may not be constant) this is called a large dynamic range. On a more global scale, the speech itself can be influenced by the speech level , i.e. the speech can range from ``whispering'' to ``shouting''.
Recording conditions: The recording conditions may vary. One of the most important quantities in this respect is the signal-to-noise ratio (SNR). Databases are often recorded ``clean'' (high SNR), and adverse conditions, such as environmental noise and crosstalk are added to the signal in a later stage. However, for some conditions such an approach is not valid (e.g. with the Lombard effect ), and the recordings have to be made under realistic conditions.
Electrical characteristics: The bandwidth is of some importance to the recognition performance. In principle, limited bandwidth contains less information about the speech, and can hence make the recognition task more difficult. However, some recognition systems may limit the bandwidth to telephone speech on purpose - even if wide band speech is available - because band limiting has the advantage of reducing the amount of data while keeping most of the speech information. In this way, some trivial filtering of noise outside the typical speech spectrum is obtained. Another ``electrical characteristic'' is the transmission channel quality. Obviously, non-ideal transformations of the signal, such as non-linearities , ticks, echo es, reverberation s and drop-out s, will have a degrading influence on the recognition performance.

Next: Capability profile versus requirement Up: Introduction Previous: Classification of recognition systems

EAGLES SWLG SoftEdition, May 1997. Get the book...