SAM Overall Quality Test

Status Completely developed software (in SOAP) allowing the use of the magnitude and categorical estimation scaling methods. Two variants are recommended: 20-point categorical estimation without reference (for test internal comparison) and magnitude estimation by line length, with imaginary ideal speech as a reference (for test external comparison) [Chapter 7]Howard-Jones92a.

Goal Comparative evaluation of overall quality aspects, particularly acceptability, intelligibility, and naturalness , for longer stretches of speech.

Languages In principle applicable to any language as long as suitable stimulus material is available.

Items Eight lists of 20 meaningful sentences of varying syntactic structures and length. For the rating of intelligibility and naturalness , speech material is available for Dutch, English, French, German, Italian, and Swedish. One list is sufficient for the evaluation of a synthesiser. Examples: I realise you're having supply problems, but this is rather excessive and I need to arrive by 10.30 a.m. on Saturday.

Procedure Each aspect of speech is rated by a different group of subjects (minimally ten). When rating acceptability, it is recommended that application specific speech materials are presented to (prospective) users. The ratings are based on two sentences each time.

Time With 160 sentences and a 5 sec interstimulus interval, the rating of one scale takes about 20 min.

Analysis Automatic.

Status	Completely developed software (in SOAP) allowing the use of the magnitude and categorical estimation scaling methods. Two variants are recommended: 20-point categorical estimation without reference (for test internal comparison) and magnitude estimation by line length, with imaginary ideal speech as a reference (for test external comparison) [Chapter 7]Howard-Jones92a.

Goal	Comparative evaluation of overall quality aspects, particularly acceptability, intelligibility, and naturalness , for longer stretches of speech.

Languages	In principle applicable to any language as long as suitable stimulus material is available.

Items	Eight lists of 20 meaningful sentences of varying syntactic structures and length. For the rating of intelligibility and naturalness , speech material is available for Dutch, English, French, German, Italian, and Swedish. One list is sufficient for the evaluation of a synthesiser. Examples: I realise you're having supply problems, but this is rather excessive and I need to arrive by 10.30 a.m. on Saturday.

Procedure	Each aspect of speech is rated by a different group of subjects (minimally ten). When rating acceptability, it is recommended that application specific speech materials are presented to (prospective) users. The ratings are based on two sentences each time.

Time	With 160 sentences and a 5 sec interstimulus interval, the rating of one scale takes about 20 min.

Analysis	Automatic.