As indicated above, the domain of the spoken language technologies ranges from speech input and output systems to complex understanding and generation systems, including multimodal systems of differing complexity (such as automatic dictation machines) and multilingual systems (with applications in different languages, but also, as in speech-to-speech translation systems, integrating processors for more than one language). The definition of de facto standards and evaluation methodologies for such systems involves the specification and development of highly specific spoken language corpus and lexicon resources together with suitable measurement and evaluation tools.
These requirements currently still determine considerable differences between spoken and written language in terms of paradigms and techniques of measurement and evaluation, which range from different practical and legal requirements for corpus construction to differences in experimental paradigms for the quality control of working systems. In these areas, the de facto standards are derived from the consensus within the spoken language community on evaluation methods and the resources required for these.
Of course, spoken language technology is still a relatively young area and thus the so-called standards that are discussed here represent only the first rung of the ladder towards the more formal standards which might emerge at a later date. The use of the term ``standards'' in the R&D community and in the context of this handbook is more usefully interpreted in terms of guidelines and recommended practices. The emergence of more prescriptive actions such as professional codes of conduct, quality marks and formal standards still lies very much in the future.
Nevertheless, the requirement for agreed standards and guidelines pervades all of the links in the spoken language system R&D chain starting from the research community (for algorithm development and benchmarking ), to product developers (for performance optimisation), system integrators (for component selection), manufacturers (for quality assurance), sales staff (for marketing), customers (for product selection) and users (for service selection).
Of course, activity in the area of standards and resources for spoken language systems is not new; for many years, the majority of spoken language R&D groups have appreciated the value of sharing recorded speech material and the importance of establishing appropriate infrastructure in terms of standardised tools, research methodology, data formats, testing procedures etc. Indeed, the national research communities in a number of countries have put into place mechanisms for discussing and exchanging such information either as a result of an initiative on the part of the research community itself (for example, the Speech Technology Assessment Group - STAG - was set up in the UK under the auspices of the Institute of Acoustics in 1983 and the IEEE operated a similar working group in the USA over ten years ago) or mediated by a central agency (such as GRECO in France and DARPA in the USA). Also, several national standards organisations have become involved, notably the National Institute for Standards and Technology (NIST - formally the National Bureau of Standards) in the USA, the National Physical Laboratory (NPL) in the UK and AFNOR in France.
The most significant activity on spoken language standards and resources in Europe has without doubt been the ESPRIT Speech Assessment Methods (SAM) project which ran from 1987 to 1993 [Fourcin (1993), Winski & Fourcin (1994)]. The SAM project arose out of the need to develop a common methodology and standards for the assessment of speech technology systems which could be applied within the framework of the different European languages. The definition of the project took place in the context of several ongoing national and international programmes of research including the UK Alvey programme, GRECO in France, COST in Europe and DARPA in the USA.
The SAM project was based on a collaboration between almost thirty laboratories in eight different countries: six countries within the EU and two from EFTA. Work was conducted in three interconnected areas:
Within this structure SAM established a set of common tools which have become widely used in a large number of participating and non-participating speech research laboratories. These tools included a reference workstation, a recommended set of protocols for recording, storing, annotating and distributing speech data, and a standard machine readable phonetic alphabet.
The SAM reference standard workstation (SESAM) was designed to provide a gateway between one European speech research laboratory and another. The minimum hardware requirements were an IBM PC-AT (or compatible) computer, an analogue interface board (OROS-AU21 or AU22), 1Mbyte of extended memory and a CD-ROM reader. SESAM hosted all SAM software products including EUROPEC, VERIPEC, PTS and ELSA for speech data collection and annotation , EURPAC and SAM_SCOR for measuring the performance of speech recognition systems , and SOAP for measuring the performance of speech synthesis systems.
The first SAM corpus - EUROM-0 - was distributed on a single CD-ROM and contained five hours of speech material. A second corpus - EUROM-1 - used the same standard format with sixty talkers in each of eight languages, speaking phonetically balanced CVC words, number sequences up to 9999 and situationally linked sentence.
In parallel with (and subsequent to) SAM , a number of other EU funded projects have focused on spoken language standards and resources. For example, SQALE was concerned with the assessment of large-vocabulary automatic speech recognition systems across different EU languages and both SUNDIAL and SUNSTAR were directed towards the assessment of multimodal interactive systems.
Other projects with significant outputs in the domain of assessment and resources include ARS, RELATOR, ONOMASTICA and SPEECHDAT , as well as major national projects and programmes of research such as the German VERBMOBIL project. In particular, one of the single, most important achievements of the SPEECHDAT project has been to initiate the creation of the European Language Resources Association (ELRA).
The European Language Resources Association was established in Luxembourg in February, 1995, with the goal of creating an organisation to promote the creation, verification, and distribution of language resources in Europe. A non-profit organisation, ELRA aims to serve as a central focal point for information related to language resources in Europe. It is intended that it will help users and developers of European language resources, as well as government agencies and other interested parties, exploit language resources for a wide variety of uses. It will also oversee the distribution of language resources via CD-ROM and other means and promote standards for such resources. Eventually, ELRA will serve as the European repository for EU-funded language resources and interact with similar bodies in other parts of the world (such as the LDC - see below).
ELRA membership is open to any organisation, public or private. Full Membership, with voting rights, is available to organisations established in the EU or European Economic Area. Organisations based elsewhere may participate as subscribers. Purely for organisational purposes, members are classified by their chief interest (spoken, written, or terminological resources). The annual membership fee has been set at a level which would encourage broad participation.
At the international level, the NATO Research Study Group on Speech Processing (NATO/AC342/Panel III/RSG10) has, since the late 1970s, provided an effective mechanism for exchanging information on spoken language standards and resources between Canada, France, Germany, the Netherlands, the UK and the USA [Moore (1986)]. RSG10 was responsible for the first publicly available multilingual speech corpus, and has subsequently released on CD-ROM a database of noises from a range of selected military and civil environments (NOISE-ROM) and related experimental test data (NOISEX).
Also, at each IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) during the 1980s, Janet Baker of Dragon Systems regularly organised a side-meeting to discuss speech databases and opportunities to share such data between different laboratories.
More recently, the International Committee for Collaboration in Speech Assessment and Databases - COCOSDA - was established in 1990 to encourage and promote international interaction and cooperation in the foundation areas of Spoken Language Processing [Moore (1991)]. COCOSDA provides a forum for international action and discussion and gives platforms for groups of workers to exchange information and to set up collaborations in the field of Spoken Language Engineering. Very many of the world's leading workers are amongst it members and the group discussions are open and unconstrained by any special interests. Meetings take place annually as a satellite event to one of the major international conferences.
In the US, the Linguistic Data Consortium (LDC) was founded in 1992 to provide a new mechanism for large-scale development and widespread sharing of resources for research in linguistic technologies. Based at the University of Pennsylvania, the LDC is a broadly-based consortium that, in 1995, included about 65 companies, universities, and government agencies. An initial grant of $5 million from ARPA amplified the effect of contributions (both of money and of data) from the broad membership base, so that there is guaranteed to be far more data than any member could afford to produce individually. In addition to distributing previously-created databases, and funding or co-funding the development of new ones, the LDC has helped researchers in several countries to publish and distribute databases that would not otherwise have been released.
The operations of the LDC are closely tied to the evolving needs of the research and development community that it supports. Since research opportunities increasingly depend on access to the consortium's materials, membership fees have been set at affordable levels, and membership is open to research groups around the world. Although US government investment in LDC database development is continuing, a significant fraction of the consortium budget comes from membership fees. These fees are now adequate to support the central staff organisation, pay database publication costs and underwrite some database creation.