Documentation mining: selected tools for online graduate teaching

Dafydd Gibbon, 2014-05-25

The fields of documentary linguistics, text technology, speech technology, digital humanities share the strategy of acquiring, archiving and disseminating the sustainable, re-usable and interoperable resources of text and speech data and tools. The obvious question arising from deployment of these strategies is: What do do with these archives?

The following examples exhibit a number of different tools for basic online documentation mining, i.e. the use of existing data compilations, in these cases relatively informally compiled paper printed legacy data, for state of the art graduate teaching purposes.

DistGraph (Visualisation of differences as distances)
An online classification and visualisation tool for similarity relations, here applied to consonant inventories of Kru languages (Cête d'Ivoire): TGA (Time Group Analysis)
An online analysis and visualisation tool for the analysis of annotated speech data, here applied to Tem (Gur>Niger-Congo, Togo). Syllables
An online visualisation tool for modelling lexical syllable descriptions as a transition graph here applied to the Mandarin pinyin table. ULex (Ubiquitous Lexicon)
An online lexicographic tool for extracting dictionary information from texts, h ere applied to the Uyghur translation of the UN Declaration of Human Rights.