EGA: Documentation model for an endangered Ivorian language

Project description, methods, and scheduling


Firmin Ahoua, Bruce Connell, Dafydd Gibbon


http://coral.lili.uni-bielefeld.de/EGA/


DOBES Workshop

Nijmegen, 15-16 Sept. 2000



Summary

The pilot project aims to develop model documentation for the endangered language Ega, the westernmost Kwa language, a linguistic isolate located in a Kru enclave, with a few hundred remaining speakers, and an unusual linguistic complexity which promises new insight into the history of the Niger-Congo languages. The documentation will develop resource types developed by the proposers in language documentation and technology projects, and the few extant Ega resources. In cooperation with the multimedia database project, a standardised model interactive questionnaire and other survey oriented structured document types, e.g. hyperlexicon and hypertext concordance, will be specified to implementation standard in close cooperation with linguists of the Universitéde Cocody, Abidjan, and a network of international specialists. Full details of this and other aspects of the project are available at the project website noted above.


The Ega language

Ega, spoken in east central Côte d'Ivoire is the westernmost Kwa-related language (Niger-Congo phylum), an isolate within the Dida (Kru) speaking area, with no known closely related languages. The classification is tentative and somewhat controversial. The influence of Kru (Bete and Godie) is strong, and minority opinion suggests that Ega is not Kwa but a true Niger-Congo isolate, a remnant language. Ega has more complex phonetics, phonology and morphology than other Kwa languages, indicating, at least, the presence of reflexes of archaic stages of Kwa language development.


Several social and political factors contribute to the endangerment of Ega. The most salient factors which make Ega a prime candidate as the focal point of a model documentation project, are the following:

Preliminary investigations indicated that Ega has the following known linguistic properties which in themselves justify careful and systematic documentation before the language becomes extinct:

  1. The most complete and still active series of contrastive voiced implosive consonants, including palatal, labio-velar implosives in contrast with non-implosives. Mbatto (Kwa) has similar features, but is in the process of losing the contrast.

  2. The most complete nine vowel ATR harmony that consistently operates in prosodic words.

  3. Complex vowel hiatus, described in a recent MA thesis (Dago 1999).

  4. The most complete nominal and gender class prefixes (Bolé-Richard 1983), which have already been widely attested in Benue-Congo languages, suggesting that the nominal classes have been almost been entirely lost in other Kwa languages. This leads to the hypothesis that documentation of Ega has the potential to make a significant contribution to the understanding of the unity of the Niger-Congo language family.

  5. A complex conjugation with an intricate tense/aspect system rarely documented among Kwa languages (Bolé-Richard 1982).


Materials 1: Descriptions of Ega


The materials which are available to date are somewhat fragmentary, and consist of isolated (and unreliable) comments in more general works, and the following specific material:


  1. Ahoua, F. and B. Connell (1999) 120 minutes laryngograph and speech signal recordings of a set of implosives in Ega with 4 native speakers.

  2. Ahoua, F. and B. Connell (1999): 10 minutes video-film on a Sony Camcorder recording larynx movement using a mirror.

  3. Bolé-Richard, Rémy 1982. L'Ega. In Atlas des langues Kwa Vol. 1 ed. By Georges Hérault. ILA: Abidjan.

  4. Bolé-Richard, Rémy 1983. La Classification Nominale en Ega. Journal of West African Languages XIII, 1.

  5. Bolé-Richard, Rémy. 1983. A word list of Ega. unpub. ms.

  6. Bolé-Richard, Rémy. 1982. Ega 350 word list in Atlas des langues Kwa vol2. ed. by Georges Hérault. ILA: Abidjan

  7. Dago, Georgette. 1999. The Phonology of Ega. Mémoire de Maîtrise, Abidjan [supervised by Firmin Ahoua, contains an extension of Bolé-Richard (1983b.) word list totalling approximately 1000 words.]

  8. Oko, Touré Cyprien. 2000. Interview sur la langue Ega. Conducted by Dafydd Gibbon & Eddy Aimé Gbéry. August 2000. Département de Linguistique, Université de Cocody, Abidjan, Côte d'Ivoire. 2 hour DAT tape, 3 CD-ROMs.


These materials will be systematically processed according to the methods outlined in the pilot project work programme. Additionally, with the cooperation of our consultants and other colleagues at U Cocody, Abidjan, further fieldwork will be conducted in order to collect additional primary data types as specified, secondary data types (2000 word lexicon, phonology, morphology, syntax), and basic ethnographic and sociolinguistic information (see following sections).


Materials 2: Sociolinguistic consultant questionaire


(Adapted from Vossen 1988, Connell, 1998, Schaefer and Egbokhare 1999)

Village

  1. District

  2. Village

  3. Name of Chief

  4. Population size

  5. Names of Ethnic groups/clans resident in the village:

  6. Names of languages spoken and known by the village population:

  7. Language(s) of intergroup communication:

  8. Language(s) used as medium of instruction in school: (years 1 - 3, years 3+):

  9. Village description:

  1. location on map

  2. institutions (e.g. schools, clinics, religious, water pumps)

  3. main occupations of residents

  4. residential distribution of population

  5. links with nearest towns

  6. notes on village history


Household


  1. District:

  2. Village:

  3. Residential Area:

  4. Husband's ethnic group/clan:

  5. Wife's ethnic group/clan:

  6. Husband's occupation:

  7. Wife's occupation:

  8. Religion:

  9. Language knowledge among family members:


    adults

    Children (5-15)

    Children (1-5)

    1st lang




    2nd lang




    3rd lang




    4th lang




  10. Language or dialects spoken at home:


    adults

    Children (5-15)

    Children (1-5)

    1st




    2nd




    3rd




    4th




  11. Language or dialects used outside the home to speakers of the same ethno-linguistic group:


    adults

    Children (5-15)

    Children (1-5)

    1st




    2nd




    3rd




    4th




  12. Language or dialects used outside the home to speakers of a different ethnolinguistic group:


adults

Children (5-15)

Children (1-5)

1st




2nd




3rd




4th





School

  1. Quelle est votre nom?

  2. a. Quelle age avez-vous? b. Vous êtes en quelle année?

  3. Etes-vous garçon ou jeune fille?

  4. Vous êtes nés en quelle village?

  5. Depuis combien de temps est-ce que vous habitez ...............?

  6. Quelle est votre langue?

  7. Qui est votre père?

  8. Quelle est sa langue?

  9. Qui est votre mère?

  10. Quelle est sa langue?

  11. Quelles langues sont parlées dans votre village?

  12. Quelle est votre premier langue?

  13. Quelles autres langues est-ce que vous parlez?

  14. Quelle langue est-ce que vous parlez mieux?

  15. Quelle langue est-ce que vous préférez parler?

  16. Quelles autres langues est-ce que vous comprenez?

  17. Quelle langue est-ce que vous parlez au foyer/avec votre mère? Votre père?

  18. Quelle langue est-ce que vous parlez les frères et les soeurs?

  19. Quelle langue est-ce que vous parlez avec les amis?

  20. Quelle langue est-ce que vous parlez avec les étrangers?

  21. Quelle langue est-ce que vous parlez en saluant les grands ou les gens agés?

  22. Quelle langue est-ce que vous parlez en blaguant?

  23. Quelle langue est-ce que vous parlez quand vous êes fachés?

  24. Quelle langue est-ce que vous parlez au marché?

  25. Quelle langue est-ce que vous parlez au champs?

  26. Quelle langue est parlée à l'église?

  27. Quelle langue est parlée à l'école?

  28. Est-ce que vous voulez que votre langue serais employer à l'école?

  29. Est-ce que vous voulez savoir écrire votre langue?


Materials 3: Language questionnaire


One of the basic documentation materials is an outline questionnaire on language typology, which was developed in a joint DAAD project with Université de Cocody, Abidjan, Côte d'Ivoire from 1997 to 2000. The original questionnaire contained outlines of different documentation levels for ethnological and linguistic analyses, and within the linguistic section for language families, languages and major varieties, dialects and sociolects. The questionnaire has several functions in the present project:

  1. to support efficient initial fieldwork elicitation and interview-based analysis;

  2. to act as a checklist for the transcription, annotation and analysis of recordings in cooperation with language consultants;

  3. to act as a preliminary specification for document design at the document architecture level;

  4. to be used as a training document for consultants and graduate students at Université de Cocody for later deployment in the documentation of other endangered languages.

The questionnaire document type is not used as a `Procrustes bed' but is developed in relation to specific languages by linguistic experts in the project; this is the first task commissioned from the local scientific consultants at Université de Cocody.


Documentation linguistique: niveau du parler


1 Situation de la langue

1.1 Situation terminologique : glossonymie

1.2 Situation ethnographique

1.3 Situation génétique

1.3.1 Affiliation génétique immédiate

1.3.2 Subdivision en dialectes

1.4 Situation sociolinguistique

1.4.1 Situation Sociolinguistique interne

Nombre de locuteures primaires

Vigueur

Stratification sociale

Groupes sociaux

Quartiers

Professions

Conventions communicatives

Termes d'adresse

Fonctions sociales

Langues spécialisées

Variantes

Schémas d'activité

Macro-formats

Pratiques communicatives

Routines pragmatiques

Conventions paralinguistiques

Littérature

Degré de standardisation et de modernisation

Néologie

1.4.2 Situation sociolinguistique externe

Langues en concurrence et multilinguisme

Degré de connaissance/instrumentalisation/description

linguistique

Statut de la langue

Attitudes linguistiques

1.5 Situation historique

1.5.1 Stades anciens

1.5.2 Contacts antérieures

1.5.3 Changements récents


2 Système de la langue

2.1 Phonologie

2.1.1 Système phonétique

Consonantisme

Vocalisme

2.1.2 Système tonal

2.1.3 Système phonotactique

2.1.4 Processus phonologiques

2.2 Morphologie

2.2.1 Classes de mots

2.2.2 Flexion

Pronominale

Nominale

Verbale

2.2.3 Formation de mots

Dérivation

Composition

2.3 Syntaxe

2.3.1 Types de phrase

Déclarative

Interrogative

Impérative

Exclamative

2.3.2 Phrase simple

Verbale

Nominale

Possessive

Existentielle-locative

2.3.3 Syntagme nominale

Modificateur possessif

Épithète

Quantificateur

Numéral

Déterminatif

2.3.4 Syntagme verbal

Actants

Circonstanciels

2.3.5 Phrase complexe

Subtantive (complétive)

Attributive (relative)

Adverbiale

Sérialisation verbiale

2.4 Lexique

2.4.1 Vocabulaire de base

2.4.2 Numérotation

2.4.3 Système de parenté

2.4.4 Onomastique

Anthroponymie

Toponymie

Hydronymie


3 Textes

3.1 Textes exemplaires

3.2 Analyse exemplaire


4 Bibliographie


Materials 4: task-oriented and various dialogue types

A number of dialogue types designed to elicit specific linguistic and communicative patterns will be used (e.g. blocks worlds, TinkerToy construction dialogues), as well as other descriptive, negotiative and ritual dialogue types.

Analysis and Labeling Techniques: multi-tier signal annotation

The main activity of the pilot documentation project is the collection and partial annotation of a variety of selected data types at a number of different levels, centring on the digital speech signal. The physical speech signal forms the basis of all field work, of course, in that it is perceived and analysed, with or without physical electronic recording, by the field linguist. However, very few physical speech signal data corpora are available for minority and, in particular, endangered languages, even in comparison with the small amounts of transcribed or other textual data. For this reason speech data collected in the proposed project will therefore be labelled, at a fine (at least syllabic) level of granularity.

Data Matching Principle (DMP): Data representations of the same speech signal as a physical event with temporal extent and a particular location are mapped both to the signal and to each other.

Informally, the principle (once stated) seems rather obvious. Both formally and operationally in practice, however, the principle is far from easy to fulfil. Starting with the DMP we define a data matching procedure as the procedure of comparing data reached by different methodologies. Examples of data matching in this sense are:


Project organisation and scheduling

The structure of the project is organised as a sub-consortium between Universität Bielefeld and Université de Cocody, Abidjan, Côte d'Ivoire. The consortial character emerged with respect to both the nature of collaboration among the three principal particiapants, their institutions, and the international network of scholars and institutions who have expressed interest in cooperating:

Principal participants:

Dr. Firmin Ahoua (Université de Cocody, Abidjan, Côte d'Ivoire, local coordinator)

Dr. Bruce Connell (Oxford University, fieldwork coordinator)

Prof. Dafydd Gibbon (Universität Bielefeld, project director)

Tasks and involvement of local scientific consultants (Université de Cocody, Abidjan, Côte d'Ivoire):

Phonology, phonetics: Dr. Ahoua, Dr. Connell

Lexicography: Dr. Gbéry, Prof. Gibbon

Functional syntax, discourse: Prof. Kouadio

Structural syntax: Prof. Mel

Morphology: Prof. Yago (Directeur du Département), Prof. Gibbon

Technical direction (computation, formats, software tools, databases, electronic publishing):

Prof. Gibbon

Thorsten Trippel, M.A.

Dipl. Ing. Soma Ouattara M.Sc. (Institut des Recherches en Mathematique Appliquée, Université de Cocody, Abidjan, Côte d'Ivoire)


Project organisation 1: Timeline


Dates

Gibbon

2 UBI stud.

Connell

Ahoua

ABJ consultants

ABJ secr.

n EGA asst.

3 ABJ stud.

Pre-project









1.9.00-30.09.00

Nijmegen workshop; order equipment; training



Interview secretary, students, Ega contracts: which village(s)?





M0-M3









1.9.00-31.10.00

Training: UBI & ABJ; Specs: XML, labelling, lexicography, questionnaire prep.; kick-off visit (Universits and Ega authorities)

Training w. XML; drafts of labelling & lexicography manuals

Socioling. questionnaire prep.; which dialects, other formal and natural data varieties?)

Training secretary, students


Prepare standardised electronic documents from existing materials

Contract w. Ega village chief(s): payment for village, chief, & assistants

Training: standard methods: X-SAMPA labelling, lexicography (Ahoua, Gibbon); collation of existing materials

1.11.00-31.12.00

Draft lexicon acquisition tools

Sample lexicon database with ABJ wordlist

Digitisation of existing data

Wordlist; check transcriptions; orthography proposal


Standardised transcription & lexicon preparation




Check and further development of lexicon and labelling

Sample labelling with ABJ transcriptions


Fieldwork: apply sociolinguistic questionnaire; Initial DAT database collection: formal and natural data

Fieldwork consultancy; agreement on orthography, categories


Ega assistants for formal data

X-SAMPA transcriptions of formal data; labelling of formal data

M3 - M6









1.1.01-31.3.01










Questionnaire morphology, lexicography; documentation software; prosody software

Initial integration of document types; test of software

Evaluation of sociolinguistic questionnaires recorded in M0-M3

Questionnaire phonetics, phonology - cf. wordlist for laryngograph, airflow data

Questionnaire morphology, lexicography; formal, functional & discourse syntax

Standardised formatting of of questionnaires


X-SAMPA transcriptions of formal data; labelling of formal data


Training in ABJ


Training in ABJ

Training in ABJ

Training in ABJ



Training in ABJ


Fieldwork in Ega land


Fieldwork in Ega land

Fieldwork in Ega land

Fieldwork consultancy


Fieldwork: both formal and natural data



WORKSHOP: Invitation of EGA expert Bole-Richard for sharing data, results


WORKSHOP: Invitation of EGA expert Bole-Richard for sharing data, results

WORKSHOP: Invitation of EGA expert Bole-Richard for sharing data, results

WORKSHOP: Invitation of EGA expert Bole-Richard for sharing data, results



WORKSHOP: Invitation of EGA expert Bole-Richard for sharing data, results





Ahoua Bielefeld





M6-M9









1.4.01-30.6.01

Multitier annotations; revisions of manuals

Multi-tier annotations for digital signals

Phonological and phonetic analysis; revised manuals

Tonemic analysis, transcription, labelling

Morphol, synt., funct. analysis; lexical integration

Collation of further Bole-Richard data


Multi-tier annotations

M9-M12









1.7.01-30.9.01

Final report draft and evaluation proposals with all participants

Final database and hypertext documents

Document revision, translation

Document revision, translation

Revision

Typing final report


Multi-tier annotations

Task specification outline

Analysis of the spectrum of tasks yields four main task types which are associated with four documentation phases. The phases are, of course, not intended to be strictly chronological, but are embedded in a complex cyclic cooperative process.



Documentation phases

Task types

Preparatory phase

Perform preparatory recordings and analysis


Establish descriptive categories


Determine varieties of language to be recorded


Prepare fieldwork materials (e.g. questionnaire; other instructions)


Negotiate fieldwork, etc. contracts


Select and prepare software tools


Train project personnel

Field phase

Digital recordings


Different locations (e.g. home, field, school)


4 hrs per day


1 hr sessions w. breaks


Evening resumee


Deployment of fieldwork software tools

Analysis phase

Transcription


Lexicon


Signal labelling


Phonological analysis


Prosodic analysis


Morphological analysis


Syntactic analysis


Dialogue analysis


Further studies at specific levels


Specimen prosodic model analyses

Archiving phase

Style-structured Word documents


Excel lexicon database


Inheritance lexicon


X-SAMPA transcription specs and transcriptions


Praat-labelled acoustic signals


Signal visualisations: waveforms, spectrograms, pitch traces


DBMS interface convertersion filters


Hyperdocument conversion filters


In the training phase, personnel are familiarised with modern documentation concepts (e.g. the distinction information structure (document semantics), document structure (document architecture, document syntax) and media structure (presentation, rendering), methods and software tools. To assist in local training, Dipl. Inf. Soma Ouattara, a member of the Computer Science Department of the Institut de Recherche en Mathématique Appliquée at Université de Cocody is being trained at Universität Bielefeld in XML document definition and production, and in re-formatting techniques for legacy data.