By a benchmark test we mean an efficient, easily administered test, or set of tests, that can be used to express the performance of a speech output system (or some module thereof) in numerical terms. The benchmark itself is the value that characterises some reference system , against which a newly developed system is (implicitly) set off. The benchmark is preferably chosen such that it represents a performance level that is known to guarantee user satisfaction. Consequently, if the performance of a new product exceeds the benchmark, its designer or prospective buyer is assured of at least a satisfactory product, and probably even better. Obviously, testing against a benchmark is more efficient than pairwise or multiple testing of competing products.
At this time it is too early to talk about either existing benchmarks or benchmark tests . It is clear, however, that the development of benchmarking deserves high priority in the speech output assessment field. As a first step, existing tests should be scrutinised for their potential use as benchmark tests . Choices should be made as to what aspects to include in benchmark tests (overall performance, composite performance by a number of crucial modules), and what system to adopt as the reference on which the benchmark value should be based. In this respect, it seems to us that one should not adopt the performance of human speech as the benchmark. Human speech, at least when produced by professional talkers, will simply be too good for the purpose of benchmarking. Since human speech will always be superior to synthetic speech, the quality of the latter will have to be expressed as a fraction, which makes it hard to compare the relative differences between different types of synthetic speech. What we need is a speech output system of proven, but still imperfect, quality. This is, quite probably, the reason why the quality of many speech output systems for English is often expressed relative to the ``Paul'' voice of MITalk/DECTalk, which has long served as the de facto standard in TTS .