- ...HREF=#chsysdfig2#824>.
- Courtesy of
Roger Moore [Moore (1994a)].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...names.
- This item was not relevant for
VERBMOBIL-PHONDAT, which made use of an extended SAMPA
notation (7-bit ASCII).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...dependencies
- In linguistics, long range dependency refers rather to the relation
between an antecedent pronominal item and the position where this item
satisfies verb valency conditions, as in
Where did John tell Mary he thought Ted was convinced Henry had
put it EMPTY ADVERBIAL POSITION ? It is known that these can also
be handled by context-free mechanisms. (Editor's note.)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...PZM
- PZM is a trademark of CROWN International Inc.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...above.
- Though the organisation
into these sections is convenient, note that the
subdivision into sections is to some extent artificial: The
relationship between setting up corpora and testing recognisers is
a case of the proverbial chicken and egg - poor
performance of a recogniser can be due to training and testing on
a poor corpus. In turn, speaker verification and
dialogue systems depend to an extent on speech recognition .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...respected
- For problems pertaining to written language (text) corpora,
the results of the EAGLES Working Group on Corpora should be consulted.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...NAME="15113">.
- This concept could be extended to the
characterisation of voices modified by external temporary factors that
affect speech production, such as alcohol for instance.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...NAME="15117">.
- It is essential to underline here that this
sort of problem is not solved yet, and will probably never be. In
particular, lie detection from the speech signal is not considered as a
realistic research area.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...terms
- For instance, some speech recognition systems
use
models of speech units that have variants across several speaker
clusters. These clusters may be obtained in an unsupervised manner, and it
is usually impossible to find a posteriori an objective attribute that would
qualify each cluster.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...identity).
- Note here that the term impostor covers two slightly different concepts:
a non-registered speaker in identification, and a speaker claiming a false
identity in verification.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...end.
- Including a possible outcome of none of the registered speakers,
in case of open-set labelling.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...NAME="15293">
- For instance, its own name, or a personal
identification number.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...password.
- For instance, the voice request for a given protected service.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...systems
- For instance,
having the applicant speaker
pronounce a new sequence
of digits for each trial session.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...systems
- For instance the
vowel [i], a nasal sound, the word /dog/, ...
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...systems
- However, such systems
may be language dependent.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...speaker,
- This term can be ambiguous in certain contexts, as it may
also be understood as a speaker who is unknown to the system. Though it is
frequently found in the literature, we do not recommend using it.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...NAME="20575">.
- Usually, a speaker who
is entitled to use the facilities, the access of which is restricted by the
system.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...system.
- For instance, for
a spoken language identification system that discriminates between languages
spoken in Switzerland, a conform speaker (conformant speaker) is a speaker who speaks
either German, French, Italian or Romansch, but not some other language the
system does not expect.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...accordance.
- For instance, a female
speaker claiming that she is a female speaker, in sex
verification .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...NAME="15351">,
- Both terms are very rarely
used.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...speaker.
- For instance, a child claiming that he is an
adult, in age verification .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...rejection:
- Sometimes called type-I error.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...acceptance:
- Sometimes called type-II error.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...wants,
- Here, a further distinction could be made between
language dependent and language independent systems.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...speaker.
- For instance, in
forensic applications, the speaker may not be physically present, or may not
be willing to cooperate.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...implement.
- On the telephone, for instance.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...likelihood
- The author refers to a probabilistic model,
namely a Hidden Markov Model.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...former.
- In fact, the system may be more efficient in
recognising the handset rather than the speaker.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...month).
- One year would be ideal, for a good
representativity of weather influence.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...NAME="15601">,
- Except, of course, when the evaluation is carried out on
contemporaneous speech.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...amount.
- See for instance [Soong et al. (1987)] for an illustration.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...phenomena.
- See again [Soong et al. (1987)] for an illustration.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...NAME="15642">.
- In other words, it is intrinsically easier
to identify 1 speaker among 10 than to identify 1 among 1000, and it
is intrinsically easier to identify 1 male speaker among 1000 adult
speakers rather than among 1000 male speakers.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...NAME="15653">,
- Sometimes
called background speakers.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...NAME="15669">
- New term that generalises the
concept of cohort .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...rejected.
- This concept is no doubt largely
academic.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...him.
- This can be the case for
forensic applications.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...positively.
- Or who are not even aware that they are being
recorded.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...systematically,
- Except himself, if he is also a
registered speaker.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...imposture.
- This step is really
necessary. It can happen that a system is more robust to same-sex imposture
than to cross-sex imposture, in particular if the pseudo-impostor bundle of a
given speaker is only composed of speakers of the same sex [Reynolds (1994)].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...evaluation:
- The experiment reported here is hypothetical.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...reduction.
- In fact,
576#576 would be the most relevant figure, especially
for small error rates, but the intuitive meaning of the logarithm of
an error rate is less immediate. This remark also holds for
mistrust rates.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...NAME="16249">,
- Sometimes familiarly called a sheep .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...NAME="16253">.
- Sometimes familiarly
called a goat .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...rate,
- The
familiar term of ram could be used to extend the
bovine analogy.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...rate.
- The familiar term of
lamb seems appropriate here.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...separately:
- With 599#599 and 600#600 being respectively the number of male and female speakers whose identity was assigned at least one to a test utterance.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...NAME="16523">
- This property can be
interesting to debug a scoring program.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...speaker.
- This remark is also
true for mistrust rates, with the estimates of 621#621 being
obtained from the output of the identification system.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...NAME="17372">,
-
Wolves , in our animal analogy.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...NAME="17376">
- Badgers seems
particularly appropriate!
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...,
- Another way of estimating the average false acceptance rate and the average imposture rate could consist in computing:
738#738
but these estimates have the drawback that, in general, 739#739.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...NAME="18809">
- In theory, a system using
speaker-dependent thresholds should never perform worse than the same system
using a speaker-independent threshold, as the latter is finally a particular
case of the former. However, difficulties may arise in obtaining reliable
estimates of each threshold, in the speaker-dependent approach.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...used:
- Any other combination would not make much sense.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...NAME="19048">
- This approach is
to be linked to a common practice which consists in summarising the
performance of a speaker verification system as
the arithmetic mean of the false rejection rate and
the false acceptance rate obtained for a particular
threshold.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...better.
- Hence, the geometric mean is usually a better way to average a false rejection rate and a false acceptance rate for a given threshold.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...system.
- In
fact, a conventional EER 908#908 is the
model EER 975#975 for any model,
but with a 976#976-validity domain of 977#977!
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...global
- That is
gender-balanced , average or test set .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...tasks,
- In particular, those for which any method will work!
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...capability.
- Especially for speaker
identification when the number of
speakers is large.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...conditions
- A cold for instance.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...NAME="19606">
- The so-called Lombard effect .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...speaker.
- Not to be recognised or to be
mistaken with somebody else, respectively.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...cf.,].
- Moreover, systems have been developed that generate audio
output together with visual output (text
printed on computer screen, for foreign language learning
purposes) and/or tactile output (braille,
specifically for the visually handicapped). In such cases
the term multimodal output is often used.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...scope.
- This intended use of objective quality measurement bears a
superficial resemblance to the work by [Kryter (1962a), Kryter (1962b)] and associates on the development of the
so-called Articulation Index (AI).
The AI accurately
predicts the loss in intelligibility of speech due to
characteristics of the transmission channel by
automatically comparing the long-term average spectral
characteristics of the input and output speech.
However, no AI-based approach can be used to discriminate
between the desirable (speech-like) and the
undesirable (noise , distortion ) features of the input
speech.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...account.
- Clearly, there is a continuum from
completely application independent at the one end and completely
application specific at the other end.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...development).
- English presents a special problem in the assignment
of
stress. The elements of English compounds are
typically separated by spaces, so that each element is
erroneously treated as a word by itself. Moreover,
the stressing of compounds in English partly depends on the
semantic relationship between the words that
make up the compound , and in part on purely lexical
factors. A comparison of English compound stress
rules developed by linguists and decision rules
automatically extracted from hand-labelled phonetic
databases has been reported by [Sproat et al. (1992)].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...(3%).
- On an isolated word
basis, this latter category could be
considered as an error; however, the error can
very often be corrected post-hoc when strings of
morphologically segmented words are analysed further
by a syntactic parser .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...markers.
- Later [Allen et al. (1987)], the duration rules
were
evaluated directly (objectively) by comparing the
predicted segment durations with the segment durations as
measured in spectrograms of new paragraphs
read by the designated speaker. The rules accounted for 84%
of the duration variance with a residual
standard deviation of 17 ms (excluding the prediction of
pause duration). Seventeen ms is generally less
than the just noticeable difference for a duration change
in a single segment in a sentence context [Klatt (1976)], which would
explain why the human reference and the
rule-derived durations were judged equally
natural.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...Test
- The notion ``function test'' in this sense
has no
relationship with our use of the term ``functional test'' . In
the SAM Prosodic Function Test
prosodic quality is not
being tested in a functional task: we are still
dealing with intuitive judgments (ratings) of how well the
melody would fulfil its function without
actually testing it.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...built.
- We will encounter a similar
problem when we come to consider system evaluation. It is possible to
evaluate the users' perceptions of the system's usability but it is not
possible to test its performance against a clear specification of what
the system should be capable of, because this is, strictly speaking,
unknown.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...necessary.
- This use of the term language model
should not be confused with current usage in
speech technology to refer to stochastic grammars or word context models
which limit the search space for word hypotheses
in automatic speech recognition. See Chapter 7. [Technical Editor]
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...``trouble''.
- ``Trouble'' is a general term
to describe a broad class of dialogue problems such as those
caused by speech recognition or parsing failures, misunderstandings, illogical
or inconsistent utterances or belief states, etc.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...standard
- Unicode is a trademark of Unicode, Inc.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...represented.
- Suggestions
for the transcription of pathological speech made at the Kiel IPA Convention
envisage this sort of detail, but they are still under discussion and have not
yet been considered with respect to computer transcription.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...utterance:
- The symbol combinations used here
are only examples, taken from a system used at UCL. They merely serve to
illustrate the type of fine acoustic categories under discussion. Other
symbols or symbol combinations could be and have been used by others to represent similar categories.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...life
- y is years. Times given are manufacturers' warranties and
claims respectively. For CD-Rs, most manufacturers offer only one-year warranties, which means that
data has to be written to the CD-R within one year.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
- t is rewind time for the entire tape
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...b
- b is bits/s
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
- cost estimates in US$ per device or Megabyte
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...device
- for devices with interchangeable medium only
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...config
- standard configuration of a high-end PC (Pentium or PowerPC processor) or middle range UNIX (Sparc) workstation
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...paper
- T.I. Boogaart1134#1134& L. Bos437#437& L. Boves1135#1135
1134#1134PTT Research, 437#437SPEX, 1136#1136Nijmegen University
Paper presented at IVTTA 1994 Japan.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...(OGI)
- OGI has collected an
11-language telephone-based corpus which has been used for common
evaluation of language identification algorithms. This corpus is
currently available through the LDC.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.