Parametric all-prosodic MBROLA PHO file generator

Help file - Dafydd Gibbon, 2007-12-03

General description

The term "all-prosodic" is borrowed from the description of the YorkTalk parametric synthesiser. In the present context, it means that the only parameters which can be controlled are prosodic parameters, according to a particular model (see below). No syntactic parsing is involved, no calculation of stress positions, out-of-vocabulary (OOV) words are not accepted, and the grapheme-to-phoneme conversion is entirely lexicon-based.

The Parametric All-Prosodic MBROLA PHO file generator requires two inputs:

Pronunciation lexicon (currently fixed; an editable lexicon is planned). The pronunciation lexicon has two columns, for the orthography used in the input, and SAMPA, providing a phonemic transcription as required by the selected MBROLA voice.
Input sentence containing words from the pronunciation lexicon and punctuation marks.

The prosodic processing is currently based entirely on punctuation marks, which are treated as markup for pause positions, and assumptions about pitch and duration modelling. No grammar processing (POS etc.) is involved in the present version.

Definitions and processing strategy

The input sentence is processed as follows:

The orthographic representations of words are converted into an IPA representation (in SAMPA notation) in a simple two-column tabular pronunciation lexicon.
The punctuation marks are interpreted as markup for pauses, and can be used to adjust the phrasing. At the moment, all punctuation marks are treated alike.
The Pitch Model specifies the global pitch pattern and the final pitch pattern.
- The Global Pitch Contour Model is an asymptotically falling pitch with adjustable parameters baseline (an idealised minimal level which is never quite reached), plus the onset (the initial pitch level which is added to the baseline) multiplied by the declination value (falling or rising) raised to the power of the position in the utterance:
  
  pitch_i = baseline + onset * slopeⁱ
  That is, the pitch of phoneme_i, the phoneme at position i in the sentence, is defined as the baseline added to the product of the onset (initial pitch), and the declination. The declination is the slope raised to the power of i, the position of phoneme i in the sentence. If slope < 1, the declination falls asymptotically towards the baseline, but never quite reaches it. The model is closely related to the so-called linear model of Pierhumbert & Liberman (1984):
  pitch_i = (pitch_i-1 - baseline) * slope + baseline
  Note that the term "linear" is misleading in any sense of the term, first, because the function is asymptotic, second, because the baseline, onset (not specified in the Pierrehumbert & Liberman model) and declination components are either global constants or vary independently of the preceding pitch).
- Jitter (in this context) is a random variation of pitch from one phoneme to the next. In other contexts, jitter has a technical meaning, usually as a kind of frequency modulation noise produced by mechanical vibrations.
- The Final Pitch Contour Model determines that the pitch slope changes differently during a specified interval immediately before a pause.
The Duration Model specifies the global tempo and the final lengthening pattern.
- Tempo specifies a factor which linearly increases or decreases the lengths of the phonemes in the sentence.
- The Final Lenthening Model determines that the tempo slows down during a specified interval immediately before a pause.
The Accent Model, which specifies pitch changes on individual words, is not yet available.

Output

The output of the Parametrised All-Prosodic MBROLA PHO file generator is a PHO file with the following specifications:

Metadata header (comment lines) consisting of
1. File ID line.
2. Phonetisation and prosody parameter values which are transferred from the HTML input form to the CGI script for processing.
3. Input sentence.
4. Phonemic transcription of input sentence.
5. MBROLA phoneme records with 6 fields:
  1. Phonemes.
  2. Durations of phonemes.
  3. Position of first pitch value.
  4. First pitch value.
  5. Position of second pitch value.
  6. Second pitch value.

Dafydd Gibbon, Tue Dec 4 19:35:23 MET 2007