Next: Recommendations on morphology
Up: Morphological information
Previous: Types of morphological information
Morphological structuring is useful for the following tasks:
- The treatment of large vocabularies for
speech recognition and
synthesis by means of rule-based generation of inflected forms from stems.
- The prediction of new (unattested, unknown) words for speech recognition on the basis of known principles of word composition, and known attested parts of words.
- Rule-based assignment of stress patterns.
- Word recognition by stem spotting.
- Construction of subword language models for speech recognition.
There are two main ways of structuring words internally into word subunits (word constituents):
- SEMANTIC ORIENTATION. On morphological grounds, word forms may be
decomposed into smaller
meaningful units, the smallest of which are morphs , the
phonological forms of morphemes ; an intermediate unit between
the morph and the word form is the stem .
- PHONOLOGICAL ORIENTATION. On phonological grounds, word forms may be decomposed into smaller
pronunciation units, the smallest of which are phonemes ; an intermediate
pronunciation unit is the syllable.
It is important to note that decomposition into syllables
is not isomorphic with decomposition into morphs. For example,
phonological has the syllable structure
/f . n . l . . kl/ and the morph
structure /fn + + lk + l/,
which are quite different from each other.
In addition to phonological decomposition, in the written mode word forms may be decomposed into smaller spelling
units, graphemes, each consisting of one or more characters.
An intermediate orthographic unit is the orthographic
break (orthographic
syllable ), which is in
general only needed for splitting words at line-breaks and does not correspond closely to either
syllable or morph boundaries but combines phonological, morphological and
orthographic criteria.
It has already been noted that
in many languages, syllables and morphs do not always coincide; morphs may be
smaller than or larger than syllables .
For the core requirements of speech recognition , in which a
closed vocabulary of attested fully inflected word forms is generally used,
morphological structuring is not necessary. Phonological
structuring into
syllables , demisyllables , diphone sequences or phonemes is widely used in order
to increase statistical coverage and to capture details of
pronunciation [Browman (1980), Ruske & Schotola (1981), Ruske (1985)].
A brief outline of the main concepts in morphology, as they affect spoken
language lexica will be useful in developing spoken language lexica (for
more detail a textbook in linguistics should be consulted, e.g.\
[Akmajian (1984)]):
- Morphology:
- Morphology is the definition of the composition of words as
a function of the meaning, syntactic function, and phonological or
orthographic form of their parts. The
morphology of spoken language is fundamentally the same as the morphology of
written language in respect of meaning, syntactic function, and the combinability
of morphemes . It differs in respect of
morphophonological alternations , which differ from spelling alternations, and word prosody (for instance word stress patterns). General definitions are given here; examples are given below.
Morphotactics (word syntax ) is the
definition
of the composition of words as a function of the forms of their parts.
Inflection is that part of morphology
which deals with the adaptation of words to their contexts within
sentences: on the basis of agreement (congruence),
e.g. between subject and verb.
Word formation is that part of morphology which
deals with the construction of words from smaller meaningful parts.
Derivation is that part of word formation
which deals with the construction of words by the concatenation of
stems with affixes (prefixes
and suffixes ).
Compounding (composition) is that part of word formation
which deals with the construction of words by concatenating words or
stems .
- Simple morphological units:
- Traditional terminology varies in this area.
A standard but incomplete definition of a morpheme , for instance, is that
it is ``the minimal meaning-bearing unit of a language''. This definition is
not entirely satisfactory, however, and for present purposes the sign-based
model and the unit of word will be used as the starting point.
A morpheme
is the smallest abstract sign-structured component of a word, and is assigned
representations of its meaning, distribution and surface
(orthographic and phonological) properties. More informally, morphemes are parts of words
defined by criteria of form, distribution and
meaning; i.e. they have meanings and are realised by
orthographic or phonological forms (morphs). They have no
internal morphological structure.
Traditionally, the two main kinds of morpheme are:
- Lexical morphemes , characterised by membership of a
large, potentially open class, with meanings such as
properties and roles of objects, states and events.
- Grammatical morphemes , characterised by membership of a
closed class, defined by their distribution with respect to larger units
such as sentences or complex words (e.g. inflectional and derivational endings;
function words such as prepositions, articles).
Morphs are, in traditional linguistics, the
orthographic or phonological
forms (realisations) of morphemes . Orthographic
morphs consist of graphemes
(either single letters or fixed
combinations of letters); in traditional
phonology, phonological morphs consist of phoneme sequences with a prosodic
pattern (e.g. word stress ).
Roots or bases (lexical morphs ) are the morphs which realise lexical morphemes and
inflectable grammatical
morphemes , and function as the smallest type of
stem in derivation
and compounding . Affixes
(prefixes , suffixes ) are morphs which realise the inflectional and derivational
beginnings and endings of words.
A free morph is a morph which can occur on its own
with no affixes or prosodic modifications
as a separate word; a bound morph is a
morph (generally an affix) which always occurs together with at least one
other morph (typically a stem in the same word.
- Complex morphological units:
- The structure of words is, like the structure of sentences, defined recursively,
since the vocabulary of a language (including new coinages) is potentially unlimited. The functional and formal classification of
morphological word structure (compounding and derivation , see above) takes this into
account. Where `out of vocabulary words' are likely to be
encountered, morphotactic rules and a
morphological parser or
morphological generator may be
required in order to supplement the lexicon. The condition of recursive
structure does not apply to inflection , which, given a finite set of stems ,
defines a finite set of fully inflected word forms (in
agglutinative languages possibly an extremely
large finite set):
- Inflectional affixation:
- A word (fully inflected word) is a stem
morphologically concatenated with a full set of inflectional affixes , e.g.\
English algorithm + s = algorithms or German ge + segn + et + en `blessed' (plural participle or adjective).
- Derivational affixation:
-
A stem is
- either a root (i.e. lexical morph ), e.g. tree, algorithm
- or a stem morphologically concatenated with a derivational affix , e.g.\
algorithm + ic, algorithm + ic + al + ly,
non + algorithm + ic + al + ly, etc.
- Compounding:
- A compound word is
a word morphologically concatenated with a word or a stem.
- Morphophonological and
orthographic alternations:
- The operation of
morphological concatenation is defined for present purposes
as ``concatenation and modification of segments at morph boundaries by boundary
phenomena.'' The details of pronunciation and spelling are altered in
morphologically complex items. An example of morphophonological
alternation is /f/ - /v/
in knife /naf/ - /navz/.
An example of orthographic alternation is
y - i - ie in fly, flier, flies.
These alternants can be described by rules:
- Morphophonological rules
are rules (analogous to spelling rules) which describe
morphophonological
alternations , i.e. the differences between pronunciations of parts of
composite words and pronunciations of corresponding parts of simplex words.
- Spelling rules are rules which describe spelling
alternations,
i.e. the differences between spellings of parts of composite words and the
spellings of corresponding parts of simplex words.
A standard technology for formulating spelling rules and morphophonological
rules is Two-Level
Morphology
(cf. [Koskenniemi (1983)], [Karttunen (1983)]; cf. [Ritchie et al. (1992)]).
Next: Recommendations on morphology
Up: Morphological information
Previous: Types of morphological information
EAGLES SWLG SoftEdition, May 1997. Get the book...