next up previous contents index
Next: SAMPA: Present status Up: Introduction Previous: Coding

Further languages

A number of other EC languages have been examined in the light of the SAMPA recommendations, and a short summary of the possible solutions for their special features is given here. For more details, see J. Wells, ``Computer-coded phonetic transcription'', Journal of the International Phonetic Association 17, No. 2, pp. 94-114, and the SAM Definition Phase Final Report (ESPRIT project 1541), January 1988.

Most of the minority languages of Europe such as Basque, Breton, Catalan, and Frisian can be transcribed adequately at a phonemic level without the need to change the principles of the present recommendation. Irish and Scottish Gaelic, however, require a decision for coding the palatalised (or ``slender'') consonants and the ``double'' nasals and laterals. Scottish Gaelic also has a back unrounded vowel series which does not occur in other EC languages. Welsh requires a solution for the voiceless alveolar lateral, represented in the orthography as ``<ll>''.

We should like now to explore whether it would be suitable to extend SAMPA for application to other languages, including Chinese, and if so how.

The question of Chinese has arisen because of the prospect of a wider collaboration on speech research between University College London and the Chinese Academy of Sciences.

Chinese already has what appears to be a satisfactory machine-readable phonetic notation in the form of Pinyin, the romanisation that has for some years been standard in the People's Republic (though not in Taiwan). Pinyin is an ingenious quasi-phonemic notation. It includes a number of unconventional digraphs, together with unconventional uses of individual Latin letters. Thus sh, ch, and zh represent retroflex/postalveolar consonants of a type that would normally be written in SAMPA as [S, tS, dZ]. Pinyin x, q, j represent a corresponding series of alveolopalatal consonants, IPA [, t, d], for which SAMPA does not currently cater. Pinyin c represents [ts], y [j], and ng [tex2html_wrap_inline45193]. The close front rounded vowel [y] is written u where there would be no confusion with [u], but ü where this confusion might arise. (This last Pinyin character is not actually machine-readable in our sense.)

Continuing to use Pinyin for Chinese but SAMPA for other languages would mean that characters such as ``x, j'' would have different meanings in different languages (``x'' = alveolopalatal fricative, or velar fricative; ``j'' = alveolopalatal affricate, or palatal approximant). But this is perhaps no worse than the ``comparative'' differences already present in the interpretation of some symbols (see above). The Pinyin notation ``i'' already covers a remarkable range of allophonic possibilities (including an r-coloured back vowel in shi and a slightly fricative central vowel in si). Are Chinese speech technologists happy with this degree of phonemic abstraction?

Tone is shown in Pinyin (if indeed it is shown) by superscript accent marks, thus mā, má, ma, mà. These are not machine-readable in the SAMPA sense. The corresponding SAMPA tone-marks would be /"ma, 'ma, `'ma, ` ma/. However these SAMPA signs have not proved popular, and perhaps ought to be changed. For Chinese, we could perhaps consider instead the use of numerals, thus ``ma1, ma2, ma3, ma4''.


next up previous contents index
Next: SAMPA: Present status Up: Introduction Previous: Coding

EAGLES SWLG SoftEdition, May 1997. Get the book...