...HREF=#chsysdfig2#824>.
Courtesy of Roger Moore [Moore (1994a)].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...names.
This item was not relevant for VERBMOBIL-PHONDAT, which made use of an extended SAMPA notation (7-bit ASCII).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...dependencies
In linguistics, long range dependency refers rather to the relation between an antecedent pronominal item and the position where this item satisfies verb valency conditions, as in Where did John tell Mary he thought Ted was convinced Henry had put it EMPTY ADVERBIAL POSITION ? It is known that these can also be handled by context-free mechanisms. (Editor's note.)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...PZM
PZM is a trademark of CROWN International Inc.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...above.
Though the organisation into these sections is convenient, note that the subdivision into sections is to some extent artificial: The relationship between setting up corpora and testing  recognisers  is a case of the proverbial chicken and egg - poor performance of a recogniser  can be due to training  and testing  on a poor corpus. In turn, speaker verification  and dialogue systems  depend to an extent on speech recognition .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...respected
For problems pertaining to written language (text) corpora, the results of the EAGLES Working Group on Corpora should be consulted.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...NAME="15113">.
This concept could be extended to the characterisation of voices modified by external temporary factors that affect speech production, such as alcohol for instance.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...NAME="15117">.
It is essential to underline here that this sort of problem is not solved yet, and will probably never be. In particular, lie detection from the speech signal is not considered as a realistic research area.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...terms
For instance, some speech recognition systems   use models of speech units that have variants across several speaker clusters. These clusters may be obtained in an unsupervised manner, and it is usually impossible to find a posteriori an objective attribute that would qualify each cluster.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...identity).
Note here that the term impostor  covers two slightly different concepts: a non-registered speaker  in identification, and a speaker claiming a false identity in verification.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...end.
Including a possible outcome of none of the registered speakers,   in case of open-set labelling.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...NAME="15293">
For instance, its own name, or a personal identification number.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...password.
For instance, the voice request for a given protected service.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...systems
For instance, having the applicant speaker   pronounce a new sequence of digits for each trial session.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...systems
For instance the vowel [i], a nasal  sound, the word /dog/, ...
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...systems
However, such systems may be language dependent.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...speaker,
This term can be ambiguous in certain contexts, as it may also be understood as a speaker who is unknown to the system. Though it is frequently found in the literature, we do not recommend using it.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...NAME="20575">.
Usually, a speaker who is entitled to use the facilities, the access of which is restricted by the system.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...system.
For instance, for a spoken language identification  system that discriminates between languages spoken in Switzerland, a conform speaker (conformant speaker) is a speaker who speaks either German, French, Italian or Romansch, but not some other language the system does not expect.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...accordance.
For instance, a female speaker claiming that she is a female speaker, in sex verification .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...NAME="15351">,
Both terms are very rarely used.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...speaker.
For instance, a child claiming that he is an adult, in age  verification .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...rejection:
Sometimes called type-I error.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...acceptance:
Sometimes called type-II error.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...wants,
Here, a further distinction could be made between language dependent and language independent systems.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...speaker.
For instance, in forensic applications, the speaker may not be physically present, or may not be willing to cooperate.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...implement.
On the telephone, for instance.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...likelihood
The author refers to a probabilistic model, namely a Hidden Markov Model. 
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...former.
In fact, the system may be more efficient in recognising the handset rather than the speaker.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...month).
One year would be ideal, for a good representativity of weather influence.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...NAME="15601">,
Except, of course, when the evaluation is carried out on contemporaneous speech.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...amount.
See for instance [Soong et al. (1987)] for an illustration.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...phenomena.
See again [Soong et al. (1987)] for an illustration.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...NAME="15642">.
In other words, it is intrinsically easier to identify 1 speaker among 10 than to identify 1 among 1000, and it is intrinsically easier to identify 1 male speaker among 1000 adult speakers rather than among 1000 male speakers.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...NAME="15653">,
Sometimes called background speakers.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...NAME="15669">
New term that generalises the concept of cohort .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...rejected.
This concept is no doubt largely academic.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...him.
This can be the case for forensic applications.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...positively.
Or who are not even aware that they are being recorded.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...systematically,
Except himself, if he is also a registered speaker.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...imposture.
This step is really necessary. It can happen that a system is more robust to same-sex  imposture than to cross-sex  imposture, in particular if the pseudo-impostor bundle  of a given speaker is only composed of speakers of the same sex  [Reynolds (1994)].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...evaluation:
The experiment reported here is hypothetical.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...reduction.
In fact, 576#576 would be the most relevant figure, especially for small error rates, but the intuitive meaning of the logarithm of an error rate is less immediate. This remark also holds for mistrust rates.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...NAME="16249">,
Sometimes familiarly called a sheep .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...NAME="16253">.
Sometimes familiarly called a goat .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...rate,
The familiar term of ram  could be used to extend the bovine analogy.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...rate.
The familiar term of lamb  seems appropriate here.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...separately:
With 599#599 and 600#600 being respectively the number of male and female speakers whose identity was assigned at least one to a test utterance.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...NAME="16523">
This property can be interesting to debug a scoring program.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...speaker.
This remark is also true for mistrust  rates, with the estimates of 621#621 being obtained from the output of the identification system.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...NAME="17372">,
Wolves , in our animal analogy.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...NAME="17376">
Badgers  seems particularly appropriate!
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...,
Another way of estimating the average false acceptance rate and the average imposture rate could consist in computing:


738#738

but these estimates have the drawback that, in general, 739#739.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...NAME="18809">
In theory, a system using speaker-dependent  thresholds should never perform worse than the same system using a speaker-independent threshold, as the latter is finally a particular case of the former. However, difficulties may arise in obtaining reliable estimates of each threshold, in the speaker-dependent  approach.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...used:
Any other combination would not make much sense.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...NAME="19048">
This approach is to be linked to a common practice which consists in summarising the performance of a speaker verification  system as the arithmetic mean of the false rejection  rate and the false acceptance  rate obtained for a particular threshold.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...better.
Hence, the geometric mean is usually a better way to average a false rejection  rate and a false acceptance  rate for a given threshold.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...system.
In fact, a conventional EER  908#908 is the model EER  975#975 for any model, but with a 976#976-validity domain of 977#977!
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...global
That is gender-balanced , average or test set .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...tasks,
In particular, those for which any method will work!
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...capability.
Especially for speaker identification  when the number of speakers is large.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...conditions
A cold for instance.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...NAME="19606">
The so-called Lombard effect .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...speaker.
Not to be recognised or to be mistaken  with somebody else, respectively.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...cf.,].
Moreover, systems have been developed that generate audio output together with visual output (text printed on computer screen, for foreign language learning purposes) and/or tactile output (braille, specifically for the visually handicapped). In such cases the term multimodal output is often used.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...scope.
This intended use of objective quality measurement bears a superficial resemblance to the work by [Kryter (1962a), Kryter (1962b)] and associates on the development of the so-called Articulation Index  (AI). The AI accurately predicts the loss in intelligibility of speech due to characteristics of the transmission channel  by automatically comparing the long-term average spectral characteristics of the input and output speech. However, no AI-based approach can be used to discriminate between the desirable (speech-like) and the undesirable (noise , distortion ) features of the input speech.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...account.
Clearly, there is a continuum from completely application independent at the one end and completely application specific at the other end.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...development).
English presents a special problem in the assignment of stress. The elements of English compounds  are typically separated by spaces, so that each element is erroneously treated as a word by itself. Moreover, the stressing of compounds  in English partly depends on the semantic relationship between the words that make up the compound , and in part on purely lexical factors. A comparison of English compound  stress rules developed by linguists and decision rules automatically extracted from hand-labelled  phonetic databases has been reported by [Sproat et al. (1992)].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...(3%).
On an isolated word basis, this latter category could be considered as an error; however, the error can very often be corrected post-hoc when strings of morphologically segmented words are analysed further by a syntactic parser .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...markers.
Later [Allen et al. (1987)], the duration rules   were evaluated directly (objectively) by comparing the predicted segment durations  with the segment durations as measured in spectrograms of new paragraphs read  by the designated speaker. The rules accounted for 84% of the duration  variance with a residual standard deviation of 17 ms (excluding the prediction of pause duration). Seventeen ms is generally less than the just noticeable difference for a duration change in a single segment in a sentence context [Klatt (1976)], which would explain why the human reference and the rule-derived durations  were judged equally natural.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...Test
The notion ``function test'' in this sense has no relationship with our use of the term ``functional test'' . In the SAM Prosodic Function Test    prosodic quality is not being tested in a functional task: we are still dealing with intuitive judgments (ratings) of how well the melody would fulfil its function without actually testing it.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...built.
We will encounter a similar problem when we come to consider system evaluation. It is possible to evaluate the users' perceptions of the system's usability but it is not possible to test  its performance against a clear specification of what the system should be capable of, because this is, strictly speaking, unknown.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...necessary.
This use of the term language model should not be confused with current usage in speech technology to refer to stochastic grammars or word context models which limit the search space for word hypotheses in automatic speech recognition. See Chapter 7. [Technical Editor]
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...``trouble''.
``Trouble'' is a general term to describe a broad class of dialogue problems such as those caused by speech recognition  or parsing  failures, misunderstandings, illogical or inconsistent utterances or belief states, etc.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...standard
Unicode is a trademark of Unicode, Inc.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...represented.
Suggestions for the transcription of pathological speech made at the Kiel IPA Convention envisage this sort of detail, but they are still under discussion and have not yet been considered with respect to computer transcription.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...utterance:
The symbol combinations used here are only examples, taken from a system used at UCL. They merely serve to illustrate the type of fine acoustic categories under discussion. Other symbols or symbol combinations could be and have been used by others to represent similar categories.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...life
y is years. Times given are manufacturers' warranties and claims respectively. For CD-Rs, most manufacturers offer only one-year warranties, which means that data has to be written to the CD-R within one year.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...
t is rewind time for the entire tape
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...b
b is bits/s
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...
cost estimates in US$ per device or Megabyte
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...device
for devices with interchangeable medium only
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...config
standard configuration of a high-end PC (Pentium or PowerPC processor) or middle range UNIX (Sparc) workstation
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...paper
T.I. Boogaart1134#1134& L. Bos437#437& L. Boves1135#1135 1134#1134PTT Research, 437#437SPEX, 1136#1136Nijmegen University Paper presented at IVTTA 1994 Japan.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...(OGI)
OGI has collected an 11-language telephone-based corpus which has been used for common evaluation of language identification algorithms. This corpus is currently available through the LDC.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

EAGLES SWLG SoftEdition, May 1997. Get the book...