Lexicon architecture and lexical database structure


Lexicon architecture pertains to the choice of basic objects and properties in the lexicon, and to the overall structure of the lexicon. More formally, it defines the relation which assigns lexical properties to lexical entries.

The term ``architecture'' generally refers to the structure of system lexica, but the term is also justified in connection with lexical database structure, particularly when more complex relational or object-oriented structures are concerned.

The basic objects in terms of which an architecture may be defined were discussed in the section on lexical information for spoken language.

The overall structure of a spoken language lexicon  is determined by a range of declarative, procedural and operational criteria such as the following:

At the one extreme is the ideal notion of a fully integrated sign-based model with non-redundant  specification of entries and property inheritance; in between is the efficient database management system   used for large scale lexical databases (see Appendix H on DBMSs), and at the other extreme is the simple pronunciation table which is the starting point for the training  of speech recognition devices. 

The choice of lexicon architecture on the basis of parameters such as those listed above, and taking into account practical constraints from the actual working environment, is application specific. There is no single principle of organisation which applies to all lexica.

The closest approximation to a neutral form of spoken language lexicon  organisation is a sign-based general background lexicon organised as a database with flexible access. Such a lexicon is basically knowledge acquisition oriented, and can function as a source for the specialised lexica required for different speech synthesis  and recognition applications. Specialised models for sublexica which are optimised for particular applications can then be formulated, and sublexica can be automatically compiled out of the main lexicon into application-specific notations and structures.

The organisation of a lexicon determines the general properties of the formalism to be used with the lexicon. Conversely, available formalisms determine tractable forms of lexicon organisation in terms of data structures, algorithms and programming environments [Knuth (1973), Carbonell & Pierrel (1986), Rudnicky et al. (1987), Lacouture & Normandin (1993)]. Object-oriented system architectures, with local encapsulation of all aspects of representation and processing, permit the construction of hybrid systems with functionally optimised components; by analogy, the lexicon itself can be conceived as a hybrid object system if required.

This is in effect the situation in current speech recognition  technology, in which a more or less large set of HMMs  representing words, for instance, can be seen as a procedurally sophisticated lexicon with acoustically driven lookup of keys which are then used to access the main lexicon. Although the standard perspective is to see the two components as separate, they can be seen as objects which are both located in hybrid spoken language system spoken language system  lexicon components.

Current research on new object-oriented interactive incremental spoken language system  architectures raises many new questions about the role of a lexicon. One major question is whether the lexicon is an object (or system of objects) in its own right, or whether the lexicon is distributed over the system components and is thus a virtual lexicon, or which components of the system, e.g. morphology and word semantics, or sentence parsing and propositional semantics, interact directly. Questions such as these are the subject of ongoing basic research, and it would be premature to make specific recommendations at this point.

For a broader discussion of lexicon architectures, the work of the EAGLES Working Group on Computational Lexica should be consulted.


