ITU-T Overall Quality Test


Status Proposal [ITU-T (1993)].
Goal Comparative evaluation of overall quality  aspects for longer stretches of speech.
Languages In principle applicable to any language as long as suitable stimulus material is available.
Items Speech samples of between 10 and 30 sec, adapted to the application. Example: Miss Robert, the running shoes Adidas Edberg Pro Club, colour: white, size: 11, reference: 501-97-52, price: 319 francs, will be delivered to you in 1 week (mail order shopping). It is recommended that a (degraded) human reference is included.
Procedure Rating of (a subset of) eight categorical estimation  scales (see below)
Time With 4 test items per system, testing 4 synthesis systems with 3 reference conditions (i.e. 7 different sources) takes about one hour for one group of subjects, including instructions to subjects and training session .
Analysis Histograms  and/or cumulative distributions of the ratings per scale and mean ratings. Little attention is paid to the answers related to content.

The eight categorial estimation scales:

  1. Acceptance (Do you think that this voice could be used for such an information service by telephone?) 1: yes, 2: no.
  2. Overall impression (How do you rate the quality of the sound of what you have just heard?) 1: excellent, 2: good, 3: fair, 4: poor, 5: bad .
  3. Listening effort  (How would you describe the effort you were required to make in order to understand the message?) 1: complete relaxation possible, no effort required, 2: attention necessary, no appreciable effort required, 3: moderate effort required, 4: effort required, 5: no meaning understood with any feasible effort.
  4. Comprehension problems (Did you find certain words hard to understand?) 1: never, 2: rarely, 3: occasionally, 4: often, 5: all of the time.
  5. Articulation (Were the sounds distinguishable?) 1: yes, very clear, 2: yes, clear enough, 3: fairly clear, 4: no, not very clear, 5: no, not at all.
  6. Pronunciation (Did you notice any anomalies in pronunciation?) 1: no, 2: yes, but not annoying, 3: yes, annoying, 4: yes, very annoying.
  7. Speaking rate  (What do you think of the average speed of delivery?) 1: much faster than preferred, 2: faster than preferred, 3: preferred, 4: slower than preferred, 5: much slower than preferred.
  8. Voice pleasantness  (How would you describe the vo ice?) 1: very pleasant, 2: pleasant, 3: fair, 4: unpleasant, 5: very unpleasant.

Subjects hear each test item twice. After the first presentation they answer questions related to the content of the test items (8 sec), after the second time they rate the scales (about 20 sec).

