Lexicon architecture and lexical database structure

Next: Lexicon architecture and the Up: Lexicon structure Previous: Spoken language lexicon formalisms

Lexicon architecture and lexical database structure

Lexicon architecture pertains to the choice of basic objects and properties in the lexicon, and to the overall structure of the lexicon. More formally, it defines the relation which assigns lexical properties to lexical entries.

The term ``architecture'' generally refers to the structure of system lexica, but the term is also justified in connection with lexical database structure, particularly when more complex relational or object-oriented structures are concerned.

The basic objects in terms of which an architecture may be defined were discussed in the section on lexical information for spoken language.

The overall structure of a spoken language lexicon is determined by a range of declarative, procedural and operational criteria such as the following:

The complexity of the information assigned to lexical entries.
The complexity of the relations defined between lexical entries.
The particular subset of objects and properties defined for a given application lexicon.
Linguistic and logical compression techniques such as redundancy rules or, more generally, inheritance hierarchies.
Task driven directionality of access.
Variety of information required for access (from phonological to pragmatic ).
Performance requirements of software (including lingware ) size and speed of access.
Techniques of acquisition and maintenance (with respect, for instance, consistency).

At the one extreme is the ideal notion of a fully integrated sign-based model with non-redundant specification of entries and property inheritance; in between is the efficient database management system used for large scale lexical databases (see Appendix H on DBMSs), and at the other extreme is the simple pronunciation table which is the starting point for the training of speech recognition devices.

The choice of lexicon architecture on the basis of parameters such as those listed above, and taking into account practical constraints from the actual working environment, is application specific. There is no single principle of organisation which applies to all lexica.

The closest approximation to a neutral form of spoken language lexicon organisation is a sign-based general background lexicon organised as a database with flexible access. Such a lexicon is basically knowledge acquisition oriented, and can function as a source for the specialised lexica required for different speech synthesis and recognition applications. Specialised models for sublexica which are optimised for particular applications can then be formulated, and sublexica can be automatically compiled out of the main lexicon into application-specific notations and structures.

The organisation of a lexicon determines the general properties of the formalism to be used with the lexicon. Conversely, available formalisms determine tractable forms of lexicon organisation in terms of data structures, algorithms and programming environments [Knuth (1973), Carbonell & Pierrel (1986), Rudnicky et al. (1987), Lacouture & Normandin (1993)]. Object-oriented system architectures, with local encapsulation of all aspects of representation and processing, permit the construction of hybrid systems with functionally optimised components; by analogy, the lexicon itself can be conceived as a hybrid object system if required.

This is in effect the situation in current speech recognition technology, in which a more or less large set of HMMs representing words, for instance, can be seen as a procedurally sophisticated lexicon with acoustically driven lookup of keys which are then used to access the main lexicon. Although the standard perspective is to see the two components as separate, they can be seen as objects which are both located in hybrid spoken language system spoken language system lexicon components.

Current research on new object-oriented interactive incremental spoken language system architectures raises many new questions about the role of a lexicon. One major question is whether the lexicon is an object (or system of objects) in its own right, or whether the lexicon is distributed over the system components and is thus a virtual lexicon, or which components of the system, e.g. morphology and word semantics, or sentence parsing and propositional semantics, interact directly. Questions such as these are the subject of ongoing basic research, and it would be premature to make specific recommendations at this point.

For a broader discussion of lexicon architectures, the work of the EAGLES Working Group on Computational Lexica should be consulted.

Next: Lexicon architecture and the Up: Lexicon structure Previous: Spoken language lexicon formalisms

EAGLES SWLG SoftEdition, May 1997. Get the book...