Current hardware uses a variety of in-house or more widespread standards. For example, coding is 8 or 16 bits at Apple (Mac) and PCs, U-LAW 8 or 16 bits at SUN (Sparc), NEXT, VAX, DEC, U-LAW or A-LAW at HP. Available sampling rates are often limited to 8kHz in the UNIX world, but higher rates may be available in the PC world (DOS/Windows) and Mac depending on the current or professional I/O boards.
File formats are often indicated with the filename extension they bear. Computer manufacturers such as NEXT and SUN deal with .au (AU) or .snd (SND) files, Apple and Silicon Graphics with .aif (AIF); I/O boards manufacturers may promote their own format (as .voc for SoundBlaster boards) and the developer of the Windows operating system, MICROSOFT, tries to impose its .wav (WAVE) format. This situation is complicated by the encoding mean (linear, compressed, data and information intermingled, etc.) and even for the same filename extension, the implementation may vary slightly for different operating systems (WAVE in Windows or UNIX environments, SND in NEXT or PC/Mac environments). A standardisation initiative comes through the development of Internet, promoting an interchange format called MIME.
A major example of the constraints imposed on the speech research community by
the market can be demonstrated by looking at the implications of the
multimedia standard development in the PC world.
The world of PCs has considerably evolved during the past few years along two relevant dimensions:
The point is now whether these current boards, primarily dedicated to audio output, can satisfy the needs of speech research and applications in terms of:
(*)(**) Using I/O boards without DSP implies that some signal processing will be deported to the PC (speech level detection, min/max measurement, eventual over- or undersampling). These on-line procedures, augmented with on-line format conversion routines, could increase the CPU load in such a way that low-level SESAM workstations could not be able to support running with a high speech sampling rate for example (or using two channels).
One topic is background compatibility with existing databases, another one is which format is going to be ``the standard'', i.e. the worldwide audio/ computer/speech standard. Such a topic is to be considered during the SPEECHDAT project, but it is foreseen that no unique standard will emerge and that conversion routines will remain a big issue. Many tools are available but as an example, even for the RIFF WAVE format the conversion between Windows and UNIX worlds is all but trivial. At the moment, it is not sure whether the current inter-changeable standard I/O boards in the market will satisfactorily meet the speech research needs or not, depending on the target application.