Spoken language systems are becoming increasingly versatile, and a central task in developing such a system is the collation of lexical information. Lexical information is required both as a means of characterising properties of words in a spoken language corpus (see Chapter 5) in a lexical database or knowledge base, and for the development of practically all system components. In related areas such as natural language processing (NLP) and computational and theoretical linguistics, the lexicon has come to play an increasingly central role. The lexicon of a spoken language system may be designed for broad or narrow coverage, for specific applications, with a particular kind of organisation, and optimised for a specific strategy of lexical search . Since the construction of a lexicon is a highly labour-intensive and thus also error-prone job, a prime requirement is for formalising lexical representations and automating lexicon development as far as possible, and in re-using lexical resources from existing applications in new developments.
The main object of this chapter is to provide a framework for relating such concepts to each other and for the formulation of recommendations for development and use of lexica for spoken language systems.
In this introductory section, some basic concepts connected with the use and structure of lexica in spoken language systems are outlined. In the following sections, specific dimensions of spoken language lexica are discussed in more detail. Particular attention is paid to lexical properties related to inflectional morphology, which is far more important for many other languages than it is for English, and other aspects of morphology which are important for the treatment of out-of-vocabulary words. Discussion is restricted to spoken language lexica as system development resources; non-electronic lexica for human use (e.g. pronunciation dictionaries in book form) are not considered. Features common to spoken and written language lexica, such as syntactic and semantic information in lexical entries, are only mentioned in passing; see the report of the EAGLES Working Group on Computational Lexica on these points. The close relation between spoken language lexica and speech corpora results in overlap with the Spoken Language Corpus chapter of this handbook.
The following sections of the chapter are concerned with basic features of spoken language lexica, types of lexical information, lexicon structure, lexical access, and lexical knowledge acquisition for spoken language lexica.