Comparing the pure size of stored NL and SL data reveals a great
quantitative difference. There are two reasons why SL data require orders of
magnitude more storage space than written language corpora.
The first one is simply the difference in coding between
text and speech. Whereas the ASCII string of a word like and needs only
three bytes, many more bytes are required as soon as the phonemes of this
word are transformed into an acoustic output for storing the AD-converted
data. If in the given example we assume that in clear speech the utterance
of a three-phoneme-syllable takes about half a second and if we apply an
amplitude quantisation of 16 bits and a non-stereo hi-fi
sampling rate of 48
kHz, the NL/SL ratio amounts to approximately 1:16000.
The second reason follows from the great variability in the phonetic forms of
spoken words. As pointed out above, any written text must
be reproduced by many speakers in more than one speaking style
(at least at slow, normal and fast speeds with low, normal, high voice, etc.), if the
corpus is intended to reflect some common sources of variability.