ITU-T Overall Quality Test

Status Proposal [ITU-T (1993)].

Goal Comparative evaluation of overall quality aspects for longer stretches of speech.

Languages In principle applicable to any language as long as suitable stimulus material is available.

Items Speech samples of between 10 and 30 sec, adapted to the application. Example: Miss Robert, the running shoes Adidas Edberg Pro Club, colour: white, size: 11, reference: 501-97-52, price: 319 francs, will be delivered to you in 1 week (mail order shopping). It is recommended that a (degraded) human reference is included.

Procedure Rating of (a subset of) eight categorical estimation scales (see below)

Time With 4 test items per system, testing 4 synthesis systems with 3 reference conditions (i.e. 7 different sources) takes about one hour for one group of subjects, including instructions to subjects and training session .

Analysis Histograms and/or cumulative distributions of the ratings per scale and mean ratings. Little attention is paid to the answers related to content.

Subjects hear each test item twice. After the first presentation they answer questions related to the content of the test items (8 sec), after the second time they rate the scales (about 20 sec).

Status	Proposal [ITU-T (1993)].

Goal	Comparative evaluation of overall quality aspects for longer stretches of speech.

Languages	In principle applicable to any language as long as suitable stimulus material is available.

Items	Speech samples of between 10 and 30 sec, adapted to the application. Example: Miss Robert, the running shoes Adidas Edberg Pro Club, colour: white, size: 11, reference: 501-97-52, price: 319 francs, will be delivered to you in 1 week (mail order shopping). It is recommended that a (degraded) human reference is included.
Procedure	Rating of (a subset of) eight categorical estimation scales (see below)

Time	With 4 test items per system, testing 4 synthesis systems with 3 reference conditions (i.e. 7 different sources) takes about one hour for one group of subjects, including instructions to subjects and training session .

Analysis	Histograms and/or cumulative distributions of the ratings per scale and mean ratings. Little attention is paid to the answers related to content.