next up previous contents index
Next: List of abbreviations Up: EAGLES SLWG Handbook Previous: More information

Glossary

acceptance
Decision outcome which consists in responding positively to a speaker (or speaker class) verification task.

agent
In the context of interactive systems, ``agent'' usually refers to a DIALOGUE PARTICIPANT, that is, the dialogue system or the user. However, it may also be used to refer to a human operator who takes over when a telephone-based dialogue goes wrong (``Please hold on; this call will be transferred to an agent'').

alignment
In determining the performance of a continuous speech recognition system, the response of the recogniser has to be compared to the transcription of the utterance presented to the system. In this process, the two word strings have to be aligned in order to compare them.

analytic testing
Procedure in which the listener is instructed to evaluate specific aspects of the performance of a speech output system, e.g.\ suitability of tempo, quality of segments, appropriateness of word stresses, sentence accents, etc.

antonymy
Two words are antonyms (a) if they are co-hyponyms with respect to given meanings, and (b) if they differ in meaning in respect of those details of the same meaning which are not shared by their hyperonym.
Example: manual and novel are antonyms. Note that the term is sometimes restricted to binary oppositions, e.g. dead - alive.

applicant speaker
The speaker using a speaker recognition system at a given instant. Alternative terms: current speaker, test speaker, unknown speaker. This term can be ambiguous in certain contexts, as it may also be understood as a speaker who is unknown to the system. Though it is frequently found in the literature, we do not recommend to use it.

application domain
An application domain is a particular DOMAIN which a dialogue system may be applied to/used in (for example, training for air-traffic controllers, timetable information provision, etc.)

assessment
(of a recognition system) The process of determining the performance of the system, and evaluation of the use for a particular application.

Audiotex
A system which plays pre-recorded messages to telephone callers provides an Audiotex service. The purpose of such services is to inform (e.g. weather forecasts, traffic information, etc.) or to entertain (e.g. horoscopes, joke lines, etc). Audiotex services are usually made available with Premium Rate Tariffs. Audiotex services tend to be tightly regulated, and they are not available in some countries.

automated speech output testing
Speech output assessment procedure in which the human observer (listener in the case of audio output, or linguist in the case of symbolic output) has been replaced (modelled) by an algorithm. Automated assessment presupposes that we know exactly how human observers evaluate differences between two (acoustic or symbolic) realisations of the same linguistic message.

automatic speech recognition system
A device that can recognise the human's speech, and can output the words that are spoken by a human.

baseline reference (condition)
Speech output of a system that contains no specific intelligence.

benchmark
The value that characterises some reference system against which a newly developed system is (implicitly) set off.

benchmark test
An efficient, easily administered test, or set of tests, that can be used to express the performance of a speech output system (or some module thereof) in numerical terms.

black box approach
Performance evaluation of a system as a whole, typically used to compare systems developed by different manufacturers, or to establish the improvement of one system relative to an earlier edition (comparative testing). Black box evaluations consider the overall performance of a system without reference to any internal components or behaviours. Evaluations of this kind address large questions such as ``How good is it as an integrated system?'' rather than detailed questions of the ``What is its word recognition rate?'' variety. Compare GLASS BOX APPROACH.

canned speech
Speech which has been recorded for use in the prompts or information play-outs of a dialogue system is referred to as canned speech or canned messages. A number of canned messages can be played out one after the other to create a single system utterance. For example, the following system utterance consists of X canned messages (identified by <...>) concatenated together: ``<Flight> <XY> <five> <seven> <two> <from> <London> <to> <Brussels> <will arrive at> <fifteen> <thirty> <seven>.'' With careful attention to prosodic issues canned speech can provide a high quality, natural-sounding interface. SPEECH SYNTHESIS, though less natural-sounding, is more flexible and thus more appropriate when lengthy or lexically rich system utterances are required.

categorical estimation
Rating method where the subject has to assign to (some aspect of) a speech output system a value from a limited range of prespecified values, e.g. ``1'' representing extremely poor and ``10'' excellent intelligibility.

co-hyponymy
Two words are co-hyponyms if and only if there is a word which is a hyperonym of each (in the same meaning of this word).
Example: manual and novel are co-hyponyms in relation to book.

common-password speaker recognition system
A text-dependent speaker recognition system for which all registered speakers have the same voice password.

communication media
Media (or ``means'') refer to materials or devices which are used by an interactive dialogue system to communicate with the user.

communication modes
A mode refers to perception senses which allow for communication: the following modes may be identified: vocal, visual, auditive, tactile, olfactive.

comparative testing
See BLACK BOX APPROACH.

competence
(vs. performance) A technical term in theoretical linguistics. Competence is a speaker/hearer's knowledge of his own language. This is contrasted with PERFORMANCE, what speakers actually say. Thus, though my competence tells me that the past tense of the verb ``go'' is went, a host of factors including fatigue, distraction, or word-play may result in my performance production of the ill-formed *goed.

comprehension test
Procedure testing a listener's understanding of a speech stimulus at the sentence or text level (often by asking the listener to answer content questions).

concept-to-speech system
Speech output system that converts some abstract representation of a communicative intention to speech.

continuous speech/connected words
A speaking style where the words form a continuous signal, i.e. the words follow each other fluently. Contrary to isolated words.

The distinction between ``connected words'' and ``continuous speech'' is somewhat technical. A connected word recogniser uses words as recognition units, which can be trained in a isolated word mode. Continuous speech is generally associated with large vocabulary recognisers that use phones as recognition units and can be trained with continuous speech.

correction rate (CR)
Percentage of all turns which are correction turns.

cross-validation
Cross-validation is a technique in statistical estimation by which the parameters of a model are optimised on a new unseen test set. In the context of stochastic language modelling, cross-validation is used to estimate the smoothing parameters.

cut-through
The system hears and understands simultaneously (single step).

deletion
(or miss) A word in the utterance that is not recognised.

diagnostic testing
See GLASS BOX TESTING.

dialogue act
This term is favoured by some authors who wish to appeal to the basic idea of SPEECH ACTS without buying into the whole philosophical apparatus of Speech Act Theory. The basic idea is that utterances can be categorised into broad classes such as questions, confirmations, statements, etc. In keeping with the practical engineering view which usually informs analyses which use the dialogue act notion, the exact inventory of categories tends to be determined by the particular needs of each dialogue application.

dialogue grammar
A grammar for describing a set of well-formed dialogues. The terminal symbols in a dialogue grammar are SPEECH ACT or DIALOGUE ACT labels (though for convenience these labels may also be treated as the start symbol for more conventional sentence or utterance grammars). A dialogue grammar might, for example, contain a rule which says that a simple information request consists of two turns, the first of which is a question, and the second of which is an answer. The philosophical roots of dialogue grammars lie in the field of DISCOURSE ANALYSIS.

dialogue history
A system-internal record of what has happened in a dialogue so far. The dialogue history provides the immediate context within which interpretation takes place.

dialogue manager
The component in an INTERACTIVE DIALOGUE SYSTEM which is responsible for maintaining dialogue coherence. Functions typically undertaken by a dialogue manager include the following:

dialogue participant
Each of the participants involved in a dialogue - those speaking and those listening - is a dialogue participant.

dictionary
(or lexicon) A lookup-table of pronunciations of all the words a (continuous) speech recognition system is capable to recognise.

discounting
Discounting is a technique in the context of language model smoothing by which the relative frequencies are discounted to allow for unseen events.

discourse analysis
The branch of linguistics which is concerned with the analysis of naturally occurring connected spoken or written discourse.

domain
The area of language usage for which a recognition system is designed to be used, a (possibly ill-defined) subset of general activity (such as business, avionics, aeronautics, medicine, transport, etc.) in which some coherent collection of TASKS may be carried out.

environment
The environment is the total context in which a recognition or interactive dialogue system is located. For example, a dashboard control system operates in an in-car environment. Environments may be characterised in many different ways. Most commonly, however, factors which might affect the performance of the system (such as high background noise) are singled out to describe environments.

error rate
The fraction of errors made by a recognition system, i.e. the number of errors divided by the number of words to be recognised. Often expressed as a percentage. See also DELETION, SUBSTITUTION and INSERTION.

estimator
A mathematical expression that can be used to estimate the value of a statistical property, such as the mean or variance.

event-dependent speaker recognition system
A text-independent speaker recognition system for which test utterances must contain a certain linguistic event (or class of events) while the rest of the acoustic material is discarded. This approach requires a preliminary step for spotting and localising the relevant events.

exchange
A pair of contiguous and related turns, one spoken by each party in the dialogue.

false (speaker) acceptance
(Sometimes called type-II error) Erroneous acceptance of an impostor in open-set speaker identification or in speaker verification.

false (speaker) rejection
(Sometimes called type-I error) Erroneous rejection of a registered speaker or of a genuine speaker in open-set speaker identification or speaker verification.

field testing
Speech output test procedure entirely run in the actual application, using the real-life situation with the actual end-users.

fixed-vocabulary speaker recognition system
A text-independent speaker recognition system for which test utterances are composed of words, the order of which varies across speakers and sessions, but for which all the words are pronounced at least once by the speaker when he registers to the system.

flawless speech
The unweighted reproducible 1:1 transduction of an acoustical signal emitted by a speaker into a sequence of 2 byte numbers that is free of any room or environment information, exhibits a sufficient signal-to-noise ratio of at least 50dB, and has been produced under recording conditions that do not impose any stress upon the speaker in addition to what might be intended for a given talking situation.

formal language
An invented language, usually developed for purposes of representation and manipulation (for example, in mathematics, logic or semantics) and not for purposes of communication.

functional testing
Assessment of speech output in terms of how well a system actually performs (some aspect of) its communicative purpose.

genuine speaker
A speaker whose real identity is in accordance with the claimed identity. By extension: a speaker whose actual character and claimed class are in accordance. (For instance, a female speaker claiming that she is a female speaker, in sex verification.). Alternative terms: authentic speaker, true speaker, correct speaker.

glass box approach
Test methodology in which the effects of all modules in a text-to-speech system but one are kept constant, and the characteristics of the free module are systematically varied, so that any difference in the assessment of the system's output must be caused by the variations in the target module (diagnostic testing). Glass box testing presupposes that the researcher has control over the input and output of each individual module. Compare BLACK BOX APPROACH.

global testing
Procedure in which the listener is instructed to attend to the general performance of a speech output system, e.g. in terms of listening effort, acceptability, and naturalness.

grammar
A set of rules that define how the words in a language can follow each other. This can include information about the probability that a sequence of words occurs.

grapheme-phoneme conversion
Module within a text-to-speech system that accepts a full-blown orthographic input (i.e. the output of a preprocessor), and outputs a string of phonemes (often, but not necessarily) including (word) stress marks, (sentence) accent positions, and boundaries.

heterography
Two orthographic forms of the same word are heterographs.
Example: standardise - standardize /st{nd@daIz/.

heterophony
Two phonological forms of the same word are heterophones.
Example: either /aID@/ - /i:D@/ `disjunction'.

homography
Two words with the same orthographic form and different phonological forms are (heterophonic) homographs.
Example: row /roU/ `horizontal sequence', /raU/ `noise, quarrel'.

homonymy
Two words with the same orthographic and phonological forms, but different syntactic categories and/or meanings are homonyms.
Example: mate /meIt/ `friend' or `state of play in a chess game'.

homophony
Two words with the same phonological form and different orthographic forms are (heterographic) homophones.
Example: meet /mi:t/ `encounter' - meat /mi:t/ `edible animal tissue'.

human-computer interaction
Often abbreviated to HCI. Any interaction between a person and a computer. Some writers use human-computer dialogue as a synonym for HCI, while others use it to identify a subtype of HCI in which natural language is used as the primary or the only medium of communication. A genuine synonym for HCI is man-machine interaction (MMI).

human-human interaction
Any encounter between two (or more) people is a human-human interaction. Thus, a conversation is a human-human interaction. Human-human interactions are interesting to interactive dialogue technologists because of the light they may shed on HUMAN-COMPUTER INTERACTIONS. However, a body of findings is being growing which shows that human-human and human-computer natural language dialogues differ systematically. Lessons for system design based on human-human dialogues must be interpreted in the light of these.

hyperonymy
If the meaning of one word is entailed by the meaning of another, it is a hyperonym of the other (a superordinate term relative to the other).
Example: book is a hyperonym of manual as the meaning of book is implied by the meaning of manual (in one of its meanings).

hyponymy
The converse of hyperonym. If the meaning of one word entails the meaning of another, it is a hyponym of the other (a subordinate term relative to the other).
Example: manual is a hyponym of book as the meaning of manual implies the meaning of book.

identification test
Procedure by which the listener is asked to identify a speech stimulus in terms of some (closed or open) set of response alternatives (e.g. some or all of the phonemes in the language).

identity assignment
Decision outcome which consists in attributing an identity to an applicant speaker, in the context of speaker identification. For speaker classification, the term class assignment should be used instead.

impostor
In the context of speaker identification, an impostor is an applicant speaker who does not belong to the set of registered speakers. In the context of speaker verification, an impostor is a speaker whose real identity is different from his claimed identity. Alternative terms: impersonator, usurpator. (Both terms are very rarely used.) For speaker classification tasks, this concept is better rendered by the term discordant speaker (for instance, a child claiming that he is an adult, in age verification).

insertion
(or false alarm) The response of a word that was not in the utterance presented to a speech recognition system.

interaction
Communication of information between two AGENTS, in which (except for the special case of the initial TURN) an agent's contribution at any given point can be construed as a response to the previous turn or turns.

interactive dialogue system
A computer system capable of engaging in turn-by-turn communication with a human user. In the general case, communication between the person and the system could use any COMMUNICATION MODE or MEDIUM (or several simultaneously). In this chapter, however, the term is usually used more restrictively to apply to systems whose primary mode of communication is spoken natural language. See also INTERACTIVE VOICE RESPONSE and SPOKEN LANGUAGE DIALOGUE SYSTEM.

Interactive Voice Response (IVR)
Interactive Voice Response (IVR) is what the commercial world calls interactive dialogue. As such, its scope encompasses certain kinds of simple interaction which research scientists do not normally think of as dialogues. For example, a telephone caller calling a weather forecasting AUDIOTEX service might be asked to say one of the words ``today'', ``tomorrow'' or ``weekend''. In the basis of what the system recognises, a canned weather forecast will be played. This is an example of IVR which is also widely known as Voice Response (VR), and a system which supports VR is usually known as a Voice Response Unit (VRU).

interpolation
Interpolation or linear interpolation is a technique in the context of language model smoothing by which the relative frequencies of a specific model are interpolated with those of a more general model. The term interpolation is often synonymous with smoothing.

isolated words
A speaking style where the words (or small phrases) are uttered separately, with small pauses in between. Contrary to continuous speech.

judgment testing
Procedure whereby a group of listeners is asked to judge the performance of a speech output system along a number of rating scales. (also called opinion testing in telecommunication research)

laboratory testing
Speech output test procedure entirely run in a laboratory, either abstracting from in vivo complications or trying to simulate real-life situations.

language model
A language model in speech recognition is used to improve the recognition accuracy. Its task is to capture the redundancy inherent to the word sequences to be recognised. This redundancy may result from both the task specific constraints and general linguistic constraints.

leaving-one-out
Leaving-one-out is a special kind of cross-validation where no additional test set is needed. Instead it is generated from the training observations by leaving out one observation at a time.

linguistic interface
First part of a text-to-speech system, which transforms spelling into an abstract phonological code (which in turn is converted to sound by the acoustic interface). The linguistic interface includes text preprocessing, grapheme-phoneme conversion, assignment of (word) stress, (sentence) accent, and boundary positions, and choice of intonation pattern.

Lombard-effect
The effect that humans speak at a higher level (use more vocal effort) in conditions of higher environmental noise.

m-gram model
An m-gram model is a stochastic language model that is based on conditional probabilities depending only on the (m-1) immediate predecessor words.

magnitude estimation
Rating method where the subject is presented with an (auditory) stimulus and is asked to express the perceived strength/quality of the relevant attribute (e.g. intelligibility) numerically (``type in a value'') or graphically (``draw a line on the computer screen'').

(speaker) misclassification
Erroneous identity assignment to a registered speaker in speaker identification.

mistaken speaker
The registered speaker owning the identity assigned erroneously to another registered speaker by a speaker identification system.

modalities
Modalities concern the way a communicating agent/party uses a mode: for speech, different modalities may be identified, whether continuous speech or isolated words are used, whether a whispering or shouting style is used, etc.

morphological decomposition
Analysis of orthographic words into morphemes, i.e. elements belonging to the finite set of smallest subword parts with an identifiable meaning. Morphological decomposition is necessary when the language/spelling allows words to be strung together without intervening spaces or hyphens.

natural language
Any non-invented language is a natural language. Thus, even the language used between people and invented systems can be termed ``natural'' if it is what users spontaneously produce in response to the situation. Natural languages can be contrasted with FORMAL LANGUAGES. See also RESTRICTED LANGUAGE and SUBLANGUAGE.

off-line testing
Procedure in which subjects are given some time to reflect before responding to a (spoken) stimulus.

on-line testing
Procedure that requires an immediate response from the subjects, tapping the perception process before it is finished.

opinion testing
See JUDGMENT TESTING.

oral dialogue
See SPOKEN LANGUAGE DIALOGUE. This term is quite widely used, though it is less favoured by native speakers of English than by those who have learned it as a second-language.

paired comparison
A psychophysical method. It is used when subjects are required to judge between two stimuli. In LES this might be judging which of two recogniser outputs has more or less intelligibility.

parametric and non-parametric tests
A distinction between two basic forms of statistical tests employed in simple hypothesis testing. Parametric tests are used when continuous measures are available. Non-parametric tests are used otherwise.

party
See DIALOGUE PARTICIPANT.

performance
(vs. competence) A term from theoretical linguistics to describe what speakers actually say. This is contrasted with COMPETENCE, what speaker/hearers know about their language. It is generally held that there is a dislocation between competence and performance such that there is not a straightforward mapping from one to the other.

performance evaluation
See BLACK BOX APPROACH.

perplexity
A measure for the complexity of a grammar.
perplexity
The (corpus) perplexity is a quantitative measure of the redundancy (or difficulty) of a recognition task for a given text corpus and a given language model. It measures how well the word sequences can be predicted by the language model.

personal-password speaker recognition system
A text-dependent speaker recognition system for which each registered speaker has his own voice password.

phone
A subword unit of speech that represents a particular sound.

phonetically balanced sentences
Sentences containing phonemes according to their frequency of occurrence in a given language.

phonetically rich sentences
Sentences containing approximately uniform phoneme frequency distributions.

population
The collection of all objects that are of interest for the task in hand.

prosody
Those properties of speech utterances that cannot be derived in a straightforward fashion from the identity of the vowel and consonant phonemes that are strung together in the linguistic representation underlying the speech utterance, e.g. intonation (i.e. speech melody), word and phrase boundaries, (word) stress, (sentence) accent, tempo, and changes in speaking rate.

register
A term from sociolinguistics which is used to identify a language variety according to its use. Every speaker of a NATURAL LANGUAGE has command of a multitude of different registers. For example, the variety of language used in a social gathering with old friends is very different to that used with a doctor in a medical surgery. Context of use can affect all aspects of language use: the choice of words, the kind of syntactic constructions, accent, etc. Register may vary during the course of a single interaction. So, for example, a very formal register may be used when people first meet but, as the conversation develops, a more relaxed and informal register may take over. For this reason it is inappropriate to try to identify what might be called ``the register for human-computer dialogue'' because such a thing is unlikely to exist as a unitary phenomenon. Instead, it is usual to try to model language over the range of varieties which might be used in some given application domain. This model is usually called a SUBLANGUAGE.

registered speaker
A speaker who belongs to the list of registered users for a given speaker recognition system (usually a speaker who is entitled to use the facilities, the access of which is restricted by the system). For speaker classification systems, we propose the term conform speaker to qualify a speaker who belongs to one of the classes of speakers for a given speaker classification system. For instance, for a spoken language identification system that discriminates between languages spoken in Switzerland, a conform speaker is a speaker who speaks either German, French, Italian or Romansch, but not a language that the system does not expect. Alternative terms: reference speaker, valid speaker, authorised speaker, subscriber, client.

rejection
Decision outcome which consists in refusing to assign a registered identity (or class) in the context of open-set speaker identification or classification, or which consists in responding negatively to a speaker (class) verification trial.

restricted language
A variety of NATURAL LANGUAGE which is restricted by externally imposed rules of use. These rules typically limit the vocabulary and the range of acceptable syntactic constructions. Restricted languages tend to be used in contexts where rapid, effective communication of a small set of basic facts is paramount, for example, in air traffic control. Because of the tightly constrained nature of restricted languages, they are seen by many to be good candidates for modelling in interactive dialogue systems. However, this advantage must be weighed against the safety-critical function of many such languages in real use.

sample
Typically, a measure cannot be taken on all units of a population. In these cases, a sample is taken. Provided precautions are taken as set out in the text, this sample may be used to study the variable of concern in the population.

segments
Consonants and vowels of a language.

signal detection theory
A model that may be used for studying the performance of speech recogniser performance. The basic idea behind signal detection theory is that errors convey information concerning how the system is operating (in this respect, it is an advance on simple error measures).

signal-to-noise ratio
The ratio of information-carrying signals (speech) to background noise. Expressed in dB.

smoothing
Smoothing is a method that is needed in the context of stochastic language modelling to counteract the effect of sparse training data. The goal of smoothing is to guarantee that all probabilities are different from zero.

speaker classification
Any decision-making process that uses some features of the speech signal to determine some characteristics of the speaker of a given utterance.

speaker recognition
Any decision-making process that uses some features of the speech signal to determine some information on the identity of the speaker of a given utterance.

speaker class identification
Any decision-making process that uses some features of the speech signal to determine the class to which the speaker of a given utterance belongs.

speaker class verification
Any decision-making process that uses some features of the speech signal to determine whether the speaker of a given utterance belongs to a given class.

speaker identification
Any decision-making process that uses some features of the speech signal to determine who the speaker of a given utterance is.

speaker verification
Any decision-making process that uses some features of the speech signal to determine whether the speaker of a given utterance is a particular person, whose identity is specified.

speech act
A speech act is the informational action that a speaker effects by producing an utterance. For example, asking a question, offering information, and making a promise are three different types of speech act. The basic idea of speech acts is vitally important in work on dialogue systems. Speech acts serve as the base level of categorisation for dialogue work (in much the way that word classes have that function at the lexical level). So, for example, DIALOGUE GRAMMARS can be written which describe well-formed sequences of speech acts.

Many researchers working on interactive dialogue systems wish to use the notion of speech act without enlisting the whole philosophical apparatus of Speech Act Theory [Austin (1962), Searle (1969)]; for this purpose the term DIALOGUE ACT has been coined and is steadily growing in acceptability.

speech output assessment
See SPEECH OUTPUT TESTING.

speech output system
Some artifact, either a dedicated machine or a computer programme, that produces signals that are intended to be functionally equivalent to speech produced by humans. In the present state of affairs speech output systems generally produce audio signals only, but laboratory systems are being developed that supplement the audio signal with the visual image of the (artificial) talker's face.

speech output testing
Determination of the quality of (some aspect(s) of) a speech output system.

speech output evaluation
See SPEECH OUTPUT TESTING.

speech synthesis
Speech Synthesis is the name given to the production of speech sounds by a machine. Most speech synthesisers take a text string as input and produce a spoken version of the text as output. Some systems allow the text string to be annotated with prosodic markers which result in changes to the intonational pattern of the speech produced.

spoken language corpus
Any collection of speech recordings which is accessible in computer readable form and which comes with annotation and documentation sufficient to allow re-use.

spoken language dialogue
Also known as ORAL DIALOGUE. A complete spoken verbal interaction between two parties (in the present case, a system and a human being), each of whom is capable of independent actions. A dialogue is composed of a sequence of steps which are, in some way, related and build on each other. Dialogue systems are thus more sophisticated than question/answer systems, in which one agent may pose a succession of unrelated queries to the other agent.

spoken language dialogue system
A variety of INTERACTIVE DIALOGUE SYSTEM in which the primary mode of communication is spoken natural language. Spoken language dialogue systems take human-human conversation as their inspiration, though differences are bound to persist into the forseeable future by virtue of the character of such systems as constrained designed artifact. Spoken language dialogue systems support a much more natural kind of dialogue than INTERACTIVE VOICE RESPONSE systems.

spoken language identification
Any decision-making process that uses some features of the speech signal to determine what language is spoken in a given utterance.

spoken language verification
Any decision-making process that uses some features of the speech signal to determine whether the language spoken in a given utterance is a particular language.

stochastic grammar
A stochastic grammar is a stochastic language model that is based on a (context free) grammar; the grammar rules are assigned probabilities such that each word string generated by the grammar has a non-zero probability.

stochastic language model
A stochastic language model is a language model that assigns probabilities to the allowed word sequences; typically all word sequences have a non-zero probability.

sublanguage
The subpart of some NATURAL LANGUAGE which is deemed to be relevant to some given task and/or application domain. Interactive dialogue systems are not currently capable of modelling an average speaker's entire linguistic competence, so the normal approach is to identify and model only the sublanguage which is relevant to the function or functions which the interactive dialogue system is intended to perform. The idea of sublanguage is related to, but distinct from the linguistic notion of REGISTER. A sublanguage in the context of interactive dialogue systems should not be confused with a sublanguage in the mathematical sense. In the latter case, the language of which the sublanguage is a part is formally well-defined; in the former case it is not.

substitution
(or misclassification) A response of a recognised that is different from the word in the utterance presented to a recognition system.

synonymy
Two words are synonyms if and only if they have the same meaning (or at least have one meaning in common), i.e. if the meaning of each entails the meaning of the other. They are partial synonyms if either has additional readings not shared by the other. They are full synonyms if they have no readings which is not shared by the other.
Example: manual and handbook are partial synonyms (manual is also, among other things, a term for a traditional organ keyboard). Full synonyms are rare. By implication, synonyms are also co-hyponyms.

system correction rate (SCR)
Percentage of all system turns which are correction turns.

system-in-the-loop
A speech data collection method which involves getting subjects to use an existing spoken language dialogue system, and recording what they say.

task
A task consists of all the activities which a user must develop in order to attain a fixed objective in some DOMAIN.

task-oriented dialogue
A dialogue concerning a specific subject, aiming at an explicit goal (such as resolving a problem or obtaining specific information). For example, dialogues concerned with obtaining travel information or booking theatre tickets are task-oriented.

text-dependent speaker recognition system
A speaker recognition system for which the training and test speech utterances are composed of exactly the same linguistic material, in the same order (typically, a password).

text-independent speaker recognition system
A speaker recognition system for which the linguistic content of test speech utterances varies across trials.

text preprocessing
The first stage of the linguistic interface of a text-to-speech system, which handles punctuation marks and other non-alphabetic textual symbols (e.g. parentheses), and expands abbreviations, acronyms, numbers, special symbols, etc. to full-blown orthographic strings.

text-prompted speaker recognition system
A speaker recognition system for which, during the test phase, a written text is prompted (through an appropriate device) to the user, who has to read it aloud.

text-to-speech system
Speech output system that converts orthographic text (generally stored in a computer memory as ASCII codes) into speech.

topline reference (condition)
Speech output that represents optimum performance, typically by a professional human talker.

training
The process in which a speech recognition system learns the pronunciation of words to be recognised at a later instance.

transaction
The part of a dialogue devoted to a single high-level task (for example, making a travel booking or checking a bank account balance). A transaction may be coextensive with a dialogue, or a dialogue may consist of more than one transaction.

turn
A stretch of speech, spoken by one PARTY in a dialogue. A stretch of speech may contain several linguistic acts or actions. A dialogue consists of a sequence of turns produced alternately by each party. Turns are also known as utterances.

unprompted speaker recognition
A speaker recognition system using totally spontaneous speech, i.e. for which the user is totally free to utter what he wants. (Here, a further distinction could be made between language dependent and language independent systems.), or for which the system has no control over the speaker. (For instance, in forensic applications, the speaker may not be physically present, or may not be willing to cooperate.).

unrestricted text-independent speaker recognition system
A text-independent speaker recognition system for which no constraints apply regarding the linguistic content of the test speech material.

user correction rate (UCR)
Percentage of all user turns which are correction turns.

violated speaker
The registered speaker owning the identity assigned erroneously to an impostor in open-set speaker identification. The registered speaker owning the identity claimed by a successful impostor in speaker verification.

vocabulary
The set of words that an automatic speech recognition system is capable to recognise.

voice characteristics
Those aspects of speech which remain relatively constant over longer stretches of speech, and constitute the background against which segmental and prosodic variation is produced and perceived (e.g. mean pitch level, mean loudness, mean tempo, harshness, creak, whisper, tongue body orientation, dialect).

voice-prompted speaker recognition system
A speaker recognition system for which, during the test phase, the user has to repeat a speech utterance, which he listens to through an audio device.

voice quality
See VOICE CHARACTERISTICS.

voice stop
In a first step the system hears; it needs a second step to understand.

Wizard-of-Oz simulation
Simulation of the behaviour of an interactive automaton by a human being. This can be done (i) by speaking to the user in a disguised voice, (ii) by choosing and triggering system predefined responses, (iii) by manually modifying some parameters of the simulation system, or (iv) by using a person to simulate the integration of existing system components (a bionic Wizard-of-Oz simulation).

word graph or word lattice
A word graph or lattice is used in the context of search in speech recognition to provide an explicit interface between the acoustic recognition and the application of the language model. The word graph or lattice should contain the most likely word hypotheses where in addition to the word hypothesis the start and end times, the nodes and an acoustic probability are given.


next up previous contents index
Next: List of abbreviations Up: EAGLES SLWG Handbook Previous: More information

EAGLES SWLG SoftEdition, May 1997. Get the book...