Printable ASCII-strings and continuously sampled speech

Next: Size differences between NL Up: Eight main differences between Previous: Orthographic identity and phonetic

Printable ASCII-strings and continuously sampled speech

Taken as pure data, written texts in European languages consist of strings of printable alphanumerical and other elements coded in 7- or 8-bit ASCII-Bytes. The resulting NL strings possess already a characteristic information structure which is not available in the case of primary SL data. Separated by blanks, punctuation marks or control codes, ASCII-strings are grouped into lexical substrings; also, the explicit punctuation of phrases and sentences is an important property of NL data. None of this type of information can be found in the recordings of primary SL data, since in natural speech there are no ASCII elements representing word boundaries , full stops, commas, colons, quotation, question, exclamation marks. Recorded SL data are primarily nothing but digitalised time functions.

EAGLES SWLG SoftEdition, May 1997. Get the book...