The speech recogniser output concerns the raw information returned to the application. As mentioned above this may be a word label (or a lexicon-entry identifier ), returned with other detailed data such as time-length of the speech signal, the energy level (which may indicate whether the user speaks loudly or not). The system may return the N best candidates: that is, the words recognised ranked with respect to their likelihood (probability scores or distance measure). Continuous speech recognisers may return a parsed sentence or a lattice of lexicon entries for analysis by a linguistic module . A word spotting system may return the number of recognised words and their respective labels with the time-frames of their occurrence.
For systems that use beeps to indicate the user's turn, the application developer may need to know whether the user speaks too close to the beep, does not speak, speaks loudly, speaks in much too noisy conditions, etc. Such information is relevant and may be processed at the application or dialogue stage.
The application developer may need to set up some of these parameters and should have access to them.