Comparability across languages

Next: Black box approach Up: Methodology Previous: Reference conditions

Comparability across languages

Although it is generally agreed that in the final analysis all languages are equally complex, it cannot be denied that phonetic and phonological complexity differs widely from one language to the next. Languages differ in the size of their vowel and consonant inventories, in the complexity of syllable structures , stress rules , reduction processes, and so forth.

A number of systems are (commercially) available that provide multilingual speech output (e.g. DECTalk, INFOVOX, MULTIVOX, APOLLO). Generally, such systems were primarily developed for one language (American English, Swedish, Hungarian, and British English, respectively), and additional language modules were derived from the original language by making minimal changes to the basic units and rules. As a result it is commonly observed that the derivate languages of multilingual systems sound poorer than the original. Yet, it is very difficult to establish this convincingly, since the poorer performance may be due (completely or in part) to the greater intrinsic difficulty of the sound system of the new language. Ultimately one would like to develop speech output assessment techniques that allow us to determine the quality of a system speaking language A and to compare its quality to that of another system speaking language B. In order to reach this objective, we would have to know how to weigh the scores obtained for a language for the intrinsic difficulty or complexity of the relevant aspects in that language.

Such goals will not easily be accomplished. However, steps have been taken in the SAM project to ensure optimal cross-language comparability in the construction of the test materials and administration procedures. For example, in the Semantically Unpredictable Sentence Test (SUS Test, see Section 12.7.7), the same five syntactic structures (defined as linear sequences of functional parts of speech, e.g.\ Subject-Verb-Direct Object) are used in all languages tested, and words are substituted in each of the designated syntactic slots that are selected from the same lexical categories, and with the shortest word length allowed by the language (see Section 12.7.7). It should be obvious, however, that complete congruence cannot be obtained in this fashion: the shortest content words in Italian and Spanish are typically disyllables , while they are monosyllabic in French and the Germanic languages. Similarly, although all five syntactic structures occur in each of the languages tested, certain structures will be more common in one language than in an other. Given the existence of such intrinsic and unavoidable structural differences between languages, we recommend further research into the development of valid cross-language normalisation measures.

Especially when working within the European Union, with its increasing number of partner countries and languages, speech output products are likely to be produced on a multilingual basis. The further development of efficient testing procedures that can be validly used for all relevant languages is a clear priority. Yet, as explained above, we should not raise our hopes too high in this matter, given the existence of intrinsic and unavoidable structural differences between languages. For this reason we recommend parallel research into the development of valid cross-language normalisation measures that will allow us to realistically compare speech output test results across languages, if the choice of test materials cannot be balanced in all relevant linguistic aspects.

In this effort, ITU recommendation P.85 has potential. Following this procedure (see Section 12.3.4) a reference grid can be constructed for each (EU) language. One possible outcome could be that some languages prove more resistent to time-frequency warping than others, although we hesitate to make any predictions. Be this as it may, differences in intelligibility between languages would be effectively normalised out when we determine the quality of an output system relative to the reference grid that is applicable for the language being tested.

Next: Black box approach Up: Methodology Previous: Reference conditions

EAGLES SWLG SoftEdition, May 1997. Get the book...