an endangered languages documentation initiative
(provisional repository)

Firmin Ahoua Université de Cocody, Abidjan, Côte d'Ivoire
Bruce Connell York University, Toronto, Canada
Dafydd Gibbon Universität Bielefeld (coordinator)

U Bielefeld, Winter Semester 2002: Lecture series on endangered languages.

We gratefully acknowledge

  • the warm welcome and energetic support of the Ega community, with whose kind permission this archive has been created,
  • stimulating cooperation with many colleagues and students,
  • infrastructure support by Universität Bielefeld (Germany), Université de Cocody (Abidjan, Ivory Coast), York University (Toronto, Canada),
  • funding from the VWS DOBES pilot phase, the DAAD, the DFG and the EC (DG13) for different aspects of our computational language documentation work.
All materials are copyright by their authors and originators and may only be used with permission (contact Dafydd Gibbon). The archive is made available in the interests of the Ega community, the cause of workable and efficient endangered language documentation, and the advancement of scholarship in this field.


Goals and description

The Ega initiative aims to develop a model for the computational language documentation of an endangered language, involving

  • design and performance of fieldwork & documentation
  • of the forms & functions of an endangered language,
  • based on workable, efficient computational linguistic & phonetic methods (see article on the WELD paradigm), and
  • including the development of appropriate software: fieldwork, description and documentation support tools.

The focus is on Ega, an endangered language isolate so far assigned to the Kwa family and surrounded by a language of the Kru family (Dida), in the Ivory Coast, West Africa.

The language is called "Ega" by most speakers and by scientists, and "Diès" by Dida speakers and some Ega speakers. Ega is the most western of the Kwa languages, the only Kwa language west of the Bandama River, and an isolate in the Nyo cluster.

The average age of fluent speakers is high, and the degree of intermingling (intermarriage and bidirectional migration) with the enclaving Dida speaking population is also high. In school, children speak French in the classroom and, typically, a regional vehicular language such as Dioula (Jula) with their schoolfriends, which weakens intergeneration transmission of the language.

An Ega greeting

from Oko Towe Cyprien

A glimpse of everyday life

An Ega mother fetching firewood.
An Ega mother fetching firewood.

Requirements specification

The Ega documentation model specifies the production of documentation which is empirically and hermeneutically complete. This means that the documentation should be sufficiently detailed at empirical (essentially: acoustic and visual), structural (essentially: grammatical and lexical) and functional (essentially: interpretative semantic and pragmatic) levels, to enable the documentation to be used effectively by the descendants of the community and by scientists in the permanent absence of native speakers.

The model is based on four generic methodological requirements:

  1. Empirical: combining annotated audio & video recordings of situated and interview data with written data.
  2. Descriptive: dealing with phonetics, prosody, lexicon, grammar, semantics, discourse from linguistically & ethnographically informed perspectives.
  3. Formal: descriptions suitable for both human (lay and scientific) and computational use, and documents structured in well-defined formats meeting standards of spoken language engineering & text technology.
  4. Infrastructural: using portable and low cost technologies, and maximally available to interested communities as open language data and open source software.


Scientific interest in text technology and computational documentation of languages is a major feature of work at Universität Bielefeld and at Université de Cocody, Abidjan (Département de Linguistique and Institut de Linguistique Appliquée), Côte d'Ivoire. The scientific interest in Ega, in addition to intrinsic interest in cooperating with partners in an endangered language community, concerns the linguistic richness of the Ega language including marked traces of historically old Niger-Congo features, and sociolinguistic reasons for the rapid decline in number of speakers.

The Ega initiative is designed as part of a long-term cooperation and is currently structured in 3 phases:

  • Preliminary investigation by Ahoua and Dago (1997), and Ahoua and Connell (1998), partly taking up work by Rémy Bole Richard in the 1980s.
  • DOBES Pilot Phase project "Ega: a documentation model for an endangered Ivorian language" (2000-2001).
  • Continuing fieldwork and documentation within the framework of the Cooperation Agreement between the Universities of Abidjan and Bielefeld (2002 - ...) in the context of research projects on computational linguistics and text technology (DAAD, DFG).

Currently the systematisation of further data types is in progress. Data will be made available on this basis in internationally compliant formats for signal and text annotation in cooperation with dedicated dissemination agencies and portals such as E-MELD.

Ega Documentation Model: Interim Results


Catalogues (for Metadata discussion see Presentations and Reports):

  • Ega Data Catalogue 2001 (Fieldwork March 2001; legacy data)
  • The 2002 Catalogue will appear shortly (Fieldwork March 2002).

Phonetic documentation:

  • Bruce Connell, Firmin Ahoua, Dafydd Gibbon (2002). "Illustrations of the IPA: Ega." Journal of the International Phonetic Association 32/1, 99-104.
    Partial set of recordings. Owing to circumstances beyond our control we have no access currently to our Ega language partners. We will publish a complete set as soon as possible.
  • Ega phonemic transcription conventions (in standard and simplified X-SAMPA format)

Prosodic documentation:

Lexical documentation:

Grammatical documentation:

Semantic documentation:

  • (in preparation)

Discourse documentation:

Sample of Ega formal speech (speaker: Oko Towe Cyprien)
Sample of Ega narrative speech (speaker: Gnaoure Grogba Marc) To convert TASX XML format into Praat format for hearing and viewing the annotation in Praat, see Tools, below. The video file will be distributed shortly and be viewed together with the annotation on the TASX Annotator (see screenshot). TASX-Annotator
Gnaoure Grogba Marc,
conteur du village de Gniguedougou,
ancien chef de village.



Zipped PPT:
ISLE Workshop on computational lexicons, Pisa, 2001
Poznan Linguistic Meeting, May 2001
Bielefeld DOBES Workshop, November 2001
DGfS Annual Meeting, Workshop "Multilingualism and Language Endangerment"
E-Meld Workshop on Digitization of lexical Information 3-5 August 2002, EMU, Ypsilanti, MI, USA


These technical documents were produced and distributed as deliverables for the DOBES pilot phase project "Ega: a documentation model for an endangered Ivorian language".

Project reports:

Metalex specifications: html PDF PostScript
PAC specifications: html PDF PostScript
Lexicon questionnaire: html PDF PostScript

Language Documentation Software and Lingware:

Source Software/Lingware Sample Documentation
U Bielefeld Portable Audio Concordance Online PAC specifications (see report)
U Bielefeld dobes_pac_ctt.dtd dobes_pac_ctt_sample.xml PAC XML specifications
MPI Nijmegen dobes_tiers_V1.dtd dobes_tiers_V1_sample.xml dobes_tiers_V1_dtd.doc
U Bielefeld Format conversion tools (freeware for Praat, Transcriber and TASX formats)


Ega proposal, October 1999 (DOBES Pilot Phase; retained June 2000)
Ega proposal, August 2001 (DOBES Main Phase; not retained March 2002)



  • Reports in: Frankfurter Allgemeine Sonntagszeitung, Züricher Tagesanzeiger, GEO, Die ZEIT
  • Interviews and references on: Deutsche Welle, Sender Freies Berlin, Deutschlandradio, National Public Radio (USA)
  • Full version of a slightly abridged reader's letter to DIE ZEIT (14 March 2002) in response to an article by Urs Willmann in DIE ZEIT #10, 28.2.2002, p. 31. Unfortunately Willmann's original article is apparently not referenced in the online archive of DIE ZEIT.


Kickoff workshop talk

Free software downloads for working with the Ega archive

Download praat
(de facto standard software used for audio annotation).

Download TASX Annotator
(Open source Java software used for video/audio signal annotation in a generic XML format).

Download PDF reader.

Download MS PowerPoint Viewer.

(Also check our format conversion tools.)

Dafydd Gibbon
Created: Fri Jun 23 19:49:55 MET DST 2000
Last updated: Sat Sep 21 14:46:31 MEST 2002