The first feature is the speaking mode: it can be isolated (the words are pronounced in isolation with pauses between two successive words), connected (usually used when spelling names or giving phone numbers digit by digit), or continuous (fluent speech).
With the three speaking modes, the speech input can be spontaneous or read (scripted speech for data entry by computer operators or a text dictated to a secretary from a manuscript document).
It is obvious that the speech production rate varies from one speaker to another. The speaking rate depends on the exploitation conditions in particular due to stressful operating conditions such as adverse physical environments. The speaking rate can be slow, normal or fast. This may be measured by the statistical distribution of the average number of speech frames within a given set of sentences. If the performance result is obtained with a particular speaking rate that is not used during the exploitation it has to be specified. A tool may be required by the application developer to measure speaking rate.
The users are likely to produce acoustic sounds that are not relevant to the application, such as cough, sneeze, clearing one's throat, lip smacks, clicks, etc. These extra-linguistic phenomena (or non linguistic phenomena) may be considered as part of the speech modelling (implementation of the rejection mode described below), or may be tackled at the linguistic level or other higher levels.
The application developer has to know if these phenomena are handled or not, and how to tune the system for that purpose if any intervention is needed.