EGA: Documentation model for an endangered Ivorian language
Project description, methods, and scheduling
Firmin Ahoua, Bruce Connell, Dafydd Gibbon
http://coral.lili.uni-bielefeld.de/EGA/
DOBES Workshop
Nijmegen, 15-16 Sept. 2000
The pilot project aims to develop model documentation for the endangered language Ega, the westernmost Kwa language, a linguistic isolate located in a Kru enclave, with a few hundred remaining speakers, and an unusual linguistic complexity which promises new insight into the history of the Niger-Congo languages. The documentation will develop resource types developed by the proposers in language documentation and technology projects, and the few extant Ega resources. In cooperation with the multimedia database project, a standardised model interactive questionnaire and other survey oriented structured document types, e.g. hyperlexicon and hypertext concordance, will be specified to implementation standard in close cooperation with linguists of the Universitéde Cocody, Abidjan, and a network of international specialists. Full details of this and other aspects of the project are available at the project website noted above.
Ega, spoken in east central Côte d'Ivoire is the westernmost Kwa-related language (Niger-Congo phylum), an isolate within the Dida (Kru) speaking area, with no known closely related languages. The classification is tentative and somewhat controversial. The influence of Kru (Bete and Godie) is strong, and minority opinion suggests that Ega is not Kwa but a true Niger-Congo isolate, a remnant language. Ega has more complex phonetics, phonology and morphology than other Kwa languages, indicating, at least, the presence of reflexes of archaic stages of Kwa language development.
Several social and political factors contribute to the endangerment of Ega. The most salient factors which make Ega a prime candidate as the focal point of a model documentation project, are the following:
Small number of speakers; estimates vary wildly from several thousand two decades ago (Bolé-Richard 1982) to currently less than 300 (Ethnologue); preliminary field investigations indicate that there is a small group of non-proximate Ega villages, with a total but rapidly decreasing population of no more than 1000.
Enclaved in the Dida (Kru) speech community.
Speakers perfectly fluent in Dida, which they use as a public language.
Perception of enclaving Dida community as stronger, dominant, efficient, high prestige.
Submissive behaviour with respect to the enclaving Dida community.
Traditional ethnic history relates to the westward Kwa migrations through other Kwa areas (e.g. Abbey) as well as Kru areas, and emphasises status of the Ega community as guests in foreign territories.
Supposition of a centuries old non-aggression, tolerance pact with the dominant Dida community.
Use of Ega deprecated among Ega speakers; self-characterisation as Dida speakers to outsiders.
Preference for exogenous marriage among Ega men, invariably with Dida women.
Use of French in education official communications and recent introduction of major Ivorian languages to schools e.g. Baule (Kwa), Bété (Kru), effectively requiring Ega speakers to be trilingual or quadrilingual.
Preliminary investigations indicated that Ega has the following known linguistic properties which in themselves justify careful and systematic documentation before the language becomes extinct:
The most complete and still active series of contrastive voiced implosive consonants, including palatal, labio-velar implosives in contrast with non-implosives. Mbatto (Kwa) has similar features, but is in the process of losing the contrast.
The most complete nine vowel ATR harmony that consistently operates in prosodic words.
Complex vowel hiatus, described in a recent MA thesis (Dago 1999).
The most complete nominal and gender class prefixes (Bolé-Richard 1983), which have already been widely attested in Benue-Congo languages, suggesting that the nominal classes have been almost been entirely lost in other Kwa languages. This leads to the hypothesis that documentation of Ega has the potential to make a significant contribution to the understanding of the unity of the Niger-Congo language family.
A complex conjugation with an intricate tense/aspect system rarely documented among Kwa languages (Bolé-Richard 1982).
The materials which are available to date are somewhat fragmentary, and consist of isolated (and unreliable) comments in more general works, and the following specific material:
Ahoua, F. and B. Connell (1999) 120 minutes laryngograph and speech signal recordings of a set of implosives in Ega with 4 native speakers.
Ahoua, F. and B. Connell (1999): 10 minutes video-film on a Sony Camcorder recording larynx movement using a mirror.
Bolé-Richard, Rémy 1982. L'Ega. In Atlas des langues Kwa Vol. 1 ed. By Georges Hérault. ILA: Abidjan.
Bolé-Richard, Rémy 1983. La Classification Nominale en Ega. Journal of West African Languages XIII, 1.
Bolé-Richard, Rémy. 1983. A word list of Ega. unpub. ms.
Bolé-Richard, Rémy. 1982. Ega 350 word list in Atlas des langues Kwa vol2. ed. by Georges Hérault. ILA: Abidjan
Dago, Georgette. 1999. The Phonology of Ega. Mémoire de Maîtrise, Abidjan [supervised by Firmin Ahoua, contains an extension of Bolé-Richard (1983b.) word list totalling approximately 1000 words.]
Oko, Touré Cyprien. 2000. Interview sur la langue Ega. Conducted by Dafydd Gibbon & Eddy Aimé Gbéry. August 2000. Département de Linguistique, Université de Cocody, Abidjan, Côte d'Ivoire. 2 hour DAT tape, 3 CD-ROMs.
These materials will be systematically processed according to the methods outlined in the pilot project work programme. Additionally, with the cooperation of our consultants and other colleagues at U Cocody, Abidjan, further fieldwork will be conducted in order to collect additional primary data types as specified, secondary data types (2000 word lexicon, phonology, morphology, syntax), and basic ethnographic and sociolinguistic information (see following sections).
(Adapted from Vossen 1988, Connell, 1998, Schaefer and Egbokhare 1999)
District
Village
Name of Chief
Population size
Names of Ethnic groups/clans resident in the village:
Names of languages spoken and known by the village population:
Language(s) of intergroup communication:
Language(s) used as medium of instruction in school: (years 1 - 3, years 3+):
Village description:
location on map
institutions (e.g. schools, clinics, religious, water pumps)
main occupations of residents
residential distribution of population
links with nearest towns
notes on village history
District:
Village:
Residential Area:
Husband's ethnic group/clan:
Wife's ethnic group/clan:
Husband's occupation:
Wife's occupation:
Religion:
Language knowledge among family members:
|
adults |
Children (5-15) |
Children (1-5) |
1st lang |
|
|
|
2nd lang |
|
|
|
3rd lang |
|
|
|
4th lang |
|
|
|
Language or dialects spoken at home:
|
adults |
Children (5-15) |
Children (1-5) |
1st |
|
|
|
2nd |
|
|
|
3rd |
|
|
|
4th |
|
|
|
Language or dialects used outside the home to speakers of the same ethno-linguistic group:
|
adults |
Children (5-15) |
Children (1-5) |
1st |
|
|
|
2nd |
|
|
|
3rd |
|
|
|
4th |
|
|
|
Language or dialects used outside the home to speakers of a different ethnolinguistic group:
|
adults |
Children (5-15) |
Children (1-5) |
1st |
|
|
|
2nd |
|
|
|
3rd |
|
|
|
4th |
|
|
|
Quelle est votre nom?
a. Quelle age avez-vous? b. Vous êtes en quelle année?
Etes-vous garçon ou jeune fille?
Vous êtes nés en quelle village?
Depuis combien de temps est-ce que vous habitez ...............?
Quelle est votre langue?
Qui est votre père?
Quelle est sa langue?
Qui est votre mère?
Quelle est sa langue?
Quelles langues sont parlées dans votre village?
Quelle est votre premier langue?
Quelles autres langues est-ce que vous parlez?
Quelle langue est-ce que vous parlez mieux?
Quelle langue est-ce que vous préférez parler?
Quelles autres langues est-ce que vous comprenez?
Quelle langue est-ce que vous parlez au foyer/avec votre mère? Votre père?
Quelle langue est-ce que vous parlez les frères et les soeurs?
Quelle langue est-ce que vous parlez avec les amis?
Quelle langue est-ce que vous parlez avec les étrangers?
Quelle langue est-ce que vous parlez en saluant les grands ou les gens agés?
Quelle langue est-ce que vous parlez en blaguant?
Quelle langue est-ce que vous parlez quand vous êes fachés?
Quelle langue est-ce que vous parlez au marché?
Quelle langue est-ce que vous parlez au champs?
Quelle langue est parlée à l'église?
Quelle langue est parlée à l'école?
Est-ce que vous voulez que votre langue serais employer à l'école?
Est-ce que vous voulez savoir écrire votre langue?
One of the basic documentation materials is an outline questionnaire on language typology, which was developed in a joint DAAD project with Université de Cocody, Abidjan, Côte d'Ivoire from 1997 to 2000. The original questionnaire contained outlines of different documentation levels for ethnological and linguistic analyses, and within the linguistic section for language families, languages and major varieties, dialects and sociolects. The questionnaire has several functions in the present project:
to support efficient initial fieldwork elicitation and interview-based analysis;
to act as a checklist for the transcription, annotation and analysis of recordings in cooperation with language consultants;
to act as a preliminary specification for document design at the document architecture level;
to be used as a training document for consultants and graduate students at Université de Cocody for later deployment in the documentation of other endangered languages.
The questionnaire document type is not used as a `Procrustes bed' but is developed in relation to specific languages by linguistic experts in the project; this is the first task commissioned from the local scientific consultants at Université de Cocody.
Documentation linguistique: niveau du parler
1 Situation de la langue
1.1 Situation terminologique : glossonymie
1.2 Situation ethnographique
1.3 Situation génétique
1.3.1 Affiliation génétique immédiate
1.3.2 Subdivision en dialectes
1.4 Situation sociolinguistique
1.4.1 Situation Sociolinguistique interne
Nombre de locuteures primaires
Vigueur
Stratification sociale
Groupes sociaux
Quartiers
Professions
Conventions communicatives
Termes d'adresse
Fonctions sociales
Langues spécialisées
Variantes
Schémas d'activité
Macro-formats
Pratiques communicatives
Routines pragmatiques
Conventions paralinguistiques
Littérature
Degré de standardisation et de modernisation
Néologie
1.4.2 Situation sociolinguistique externe
Langues en concurrence et multilinguisme
Degré de connaissance/instrumentalisation/description
linguistique
Statut de la langue
Attitudes linguistiques
1.5 Situation historique
1.5.1 Stades anciens
1.5.2 Contacts antérieures
1.5.3 Changements récents
2 Système de la langue
2.1 Phonologie
2.1.1 Système phonétique
Consonantisme
Vocalisme
2.1.2 Système tonal
2.1.3 Système phonotactique
2.1.4 Processus phonologiques
2.2 Morphologie
2.2.1 Classes de mots
2.2.2 Flexion
Pronominale
Nominale
Verbale
2.2.3 Formation de mots
Dérivation
Composition
2.3 Syntaxe
2.3.1 Types de phrase
Déclarative
Interrogative
Impérative
Exclamative
2.3.2 Phrase simple
Verbale
Nominale
Possessive
Existentielle-locative
2.3.3 Syntagme nominale
Modificateur possessif
Épithète
Quantificateur
Numéral
Déterminatif
2.3.4 Syntagme verbal
Actants
Circonstanciels
2.3.5 Phrase complexe
Subtantive (complétive)
Attributive (relative)
Adverbiale
Sérialisation verbiale
2.4 Lexique
2.4.1 Vocabulaire de base
2.4.2 Numérotation
2.4.3 Système de parenté
2.4.4 Onomastique
Anthroponymie
Toponymie
Hydronymie
3 Textes
3.1 Textes exemplaires
3.2 Analyse exemplaire
4 Bibliographie
A number of dialogue types designed to elicit specific linguistic and communicative patterns will be used (e.g. blocks worlds, TinkerToy construction dialogues), as well as other descriptive, negotiative and ritual dialogue types.
The main activity of the pilot documentation project is the collection and partial annotation of a variety of selected data types at a number of different levels, centring on the digital speech signal. The physical speech signal forms the basis of all field work, of course, in that it is perceived and analysed, with or without physical electronic recording, by the field linguist. However, very few physical speech signal data corpora are available for minority and, in particular, endangered languages, even in comparison with the small amounts of transcribed or other textual data. For this reason speech data collected in the proposed project will therefore be labelled, at a fine (at least syllabic) level of granularity.
Data Matching Principle (DMP): Data representations of the same speech signal as a physical event with temporal extent and a particular location are mapped both to the signal and to each other.
Informally, the principle (once stated) seems rather obvious. Both formally and operationally in practice, however, the principle is far from easy to fulfil. Starting with the DMP we define a data matching procedure as the procedure of comparing data reached by different methodologies. Examples of data matching in this sense are:
creation of interlinear translations, glosses, and morphological classifications;
merging of different types of lexical information in the microstructure of a dictionary;
tagging (markup) of texts and transcriptions with part of speech (POS), morphological, semantic, etc. category names;
assignment of parallel markup to the same text;
signal annotation with phonemic, orthographic, or other symbolic labels;
temporal alignment of parallel audio, video, laryngographic, pitch extraction, and airflow signal sources.
The structure of the project is organised as a sub-consortium between Universität Bielefeld and Université de Cocody, Abidjan, Côte d'Ivoire. The consortial character emerged with respect to both the nature of collaboration among the three principal particiapants, their institutions, and the international network of scholars and institutions who have expressed interest in cooperating:
Principal participants:
Dr. Firmin Ahoua (Université de Cocody, Abidjan, Côte d'Ivoire, local coordinator)
Dr. Bruce Connell (Oxford University, fieldwork coordinator)
Prof. Dafydd Gibbon (Universität Bielefeld, project director)
Tasks and involvement of local scientific consultants (Université de Cocody, Abidjan, Côte d'Ivoire):
Phonology, phonetics: Dr. Ahoua, Dr. Connell
Lexicography: Dr. Gbéry, Prof. Gibbon
Functional syntax, discourse: Prof. Kouadio
Structural syntax: Prof. Mel
Morphology: Prof. Yago (Directeur du Département), Prof. Gibbon
Technical direction (computation, formats, software tools, databases, electronic publishing):
Prof. Gibbon
Thorsten Trippel, M.A.
Dipl. Ing. Soma Ouattara M.Sc. (Institut des Recherches en Mathematique Appliquée, Université de Cocody, Abidjan, Côte d'Ivoire)
Dates |
Gibbon |
2 UBI stud. |
Connell |
Ahoua |
ABJ consultants |
ABJ secr. |
n EGA asst. |
3 ABJ stud. |
Pre-project |
|
|
|
|
|
|
|
|
1.9.00-30.09.00 |
Nijmegen workshop; order equipment; training |
|
|
Interview secretary, students, Ega contracts: which village(s)? |
|
|
|
|
M0-M3 |
|
|
|
|
|
|
|
|
1.9.00-31.10.00 |
Training: UBI & ABJ; Specs: XML, labelling, lexicography, questionnaire prep.; kick-off visit (Universits and Ega authorities) |
Training w. XML; drafts of labelling & lexicography manuals |
Socioling. questionnaire prep.; which dialects, other formal and natural data varieties?) |
Training secretary, students |
|
Prepare standardised electronic documents from existing materials |
Contract w. Ega village chief(s): payment for village, chief, & assistants |
Training: standard methods: X-SAMPA labelling, lexicography (Ahoua, Gibbon); collation of existing materials |
1.11.00-31.12.00 |
Draft lexicon acquisition tools |
Sample lexicon database with ABJ wordlist |
Digitisation of existing data |
Wordlist; check transcriptions; orthography proposal |
|
Standardised transcription & lexicon preparation |
|
|
|
Check and further development of lexicon and labelling |
Sample labelling with ABJ transcriptions |
|
Fieldwork: apply sociolinguistic questionnaire; Initial DAT database collection: formal and natural data |
Fieldwork consultancy; agreement on orthography, categories |
|
Ega assistants for formal data |
X-SAMPA transcriptions of formal data; labelling of formal data |
M3 - M6 |
|
|
|
|
|
|
|
|
1.1.01-31.3.01 |
|
|
|
|
|
|
|
|
|
Questionnaire morphology, lexicography; documentation software; prosody software |
Initial integration of document types; test of software |
Evaluation of sociolinguistic questionnaires recorded in M0-M3 |
Questionnaire phonetics, phonology - cf. wordlist for laryngograph, airflow data |
Questionnaire morphology, lexicography; formal, functional & discourse syntax |
Standardised formatting of of questionnaires |
|
X-SAMPA transcriptions of formal data; labelling of formal data |
|
Training in ABJ |
|
Training in ABJ |
Training in ABJ |
Training in ABJ |
|
|
Training in ABJ |
|
Fieldwork in Ega land |
|
Fieldwork in Ega land |
Fieldwork in Ega land |
Fieldwork consultancy |
|
Fieldwork: both formal and natural data |
|
|
WORKSHOP: Invitation of EGA expert Bole-Richard for sharing data, results |
|
WORKSHOP: Invitation of EGA expert Bole-Richard for sharing data, results |
WORKSHOP: Invitation of EGA expert Bole-Richard for sharing data, results |
WORKSHOP: Invitation of EGA expert Bole-Richard for sharing data, results |
|
|
WORKSHOP: Invitation of EGA expert Bole-Richard for sharing data, results |
|
|
|
|
Ahoua Bielefeld |
|
|
|
|
M6-M9 |
|
|
|
|
|
|
|
|
1.4.01-30.6.01 |
Multitier annotations; revisions of manuals |
Multi-tier annotations for digital signals |
Phonological and phonetic analysis; revised manuals |
Tonemic analysis, transcription, labelling |
Morphol, synt., funct. analysis; lexical integration |
Collation of further Bole-Richard data |
|
Multi-tier annotations |
M9-M12 |
|
|
|
|
|
|
|
|
1.7.01-30.9.01 |
Final report draft and evaluation proposals with all participants |
Final database and hypertext documents |
Document revision, translation |
Document revision, translation |
Revision |
Typing final report |
|
Multi-tier annotations |
Analysis of the spectrum of tasks yields four main task types which are associated with four documentation phases. The phases are, of course, not intended to be strictly chronological, but are embedded in a complex cyclic cooperative process.
Documentation phases |
Task types |
Preparatory phase |
Perform preparatory recordings and analysis |
|
Establish descriptive categories |
|
Determine varieties of language to be recorded |
|
Prepare fieldwork materials (e.g. questionnaire; other instructions) |
|
Negotiate fieldwork, etc. contracts |
|
Select and prepare software tools |
|
Train project personnel |
Field phase |
Digital recordings |
|
Different locations (e.g. home, field, school) |
|
4 hrs per day |
|
1 hr sessions w. breaks |
|
Evening resumee |
|
Deployment of fieldwork software tools |
Analysis phase |
Transcription |
|
Lexicon |
|
Signal labelling |
|
Phonological analysis |
|
Prosodic analysis |
|
Morphological analysis |
|
Syntactic analysis |
|
Dialogue analysis |
|
Further studies at specific levels |
|
Specimen prosodic model analyses |
Archiving phase |
Style-structured Word documents |
|
Excel lexicon database |
|
Inheritance lexicon |
|
X-SAMPA transcription specs and transcriptions |
|
Praat-labelled acoustic signals |
|
Signal visualisations: waveforms, spectrograms, pitch traces |
|
DBMS interface convertersion filters |
|
Hyperdocument conversion filters |
In the training phase, personnel are familiarised with modern documentation concepts (e.g. the distinction information structure (document semantics), document structure (document architecture, document syntax) and media structure (presentation, rendering), methods and software tools. To assist in local training, Dipl. Inf. Soma Ouattara, a member of the Computer Science Department of the Institut de Recherche en Mathématique Appliquée at Université de Cocody is being trained at Universität Bielefeld in XML document definition and production, and in re-formatting techniques for legacy data.