Parametric all-prosodic MBROLA PHO file generator

Help file - Dafydd Gibbon, 2007-12-03


General description

The term "all-prosodic" is borrowed from the description of the YorkTalk parametric synthesiser. In the present context, it means that the only parameters which can be controlled are prosodic parameters, according to a particular model (see below). No syntactic parsing is involved, no calculation of stress positions, out-of-vocabulary (OOV) words are not accepted, and the grapheme-to-phoneme conversion is entirely lexicon-based.

The Parametric All-Prosodic MBROLA PHO file generator requires two inputs:

  1. Pronunciation lexicon (currently fixed; an editable lexicon is planned). The pronunciation lexicon has two columns, for the orthography used in the input, and SAMPA, providing a phonemic transcription as required by the selected MBROLA voice.
  2. Input sentence containing words from the pronunciation lexicon and punctuation marks.
The prosodic processing is currently based entirely on punctuation marks, which are treated as markup for pause positions, and assumptions about pitch and duration modelling. No grammar processing (POS etc.) is involved in the present version.


Definitions and processing strategy

The input sentence is processed as follows:

  1. The orthographic representations of words are converted into an IPA representation (in SAMPA notation) in a simple two-column tabular pronunciation lexicon.
  2. The punctuation marks are interpreted as markup for pauses, and can be used to adjust the phrasing. At the moment, all punctuation marks are treated alike.
  3. The Pitch Model specifies the global pitch pattern and the final pitch pattern.
  4. The Duration Model specifies the global tempo and the final lengthening pattern.
  5. The Accent Model, which specifies pitch changes on individual words, is not yet available.

Output

The output of the Parametrised All-Prosodic MBROLA PHO file generator is a PHO file with the following specifications:
  1. Metadata header (comment lines) consisting of
    1. File ID line.
    2. Phonetisation and prosody parameter values which are transferred from the HTML input form to the CGI script for processing.
    3. Input sentence.
    4. Phonemic transcription of input sentence.
    5. MBROLA phoneme records with 6 fields:
      1. Phonemes.
      2. Durations of phonemes.
      3. Position of first pitch value.
      4. First pitch value.
      5. Position of second pitch value.
      6. Second pitch value.

Dafydd Gibbon, Tue Dec 4 19:35:23 MET 2007