In a collaborative project between PTT Research, Philips Research Aachen and the Netherlands Organization for Scientific Research NWO we are working on the development of a Dutch version of the Train Time Table Information service that is already available for German and that is described in another paper in these Proceedings. The system we have in mind can best be characterised as a guided mixed-initiative dialogue system: the system will ask specific questions, like From where to where do you want to travel?, but it will allow the user to give under- and overinformative answers. If an answer is underinformative, the system will ask explicit questions to elicit the missing information. When an answer is overinformative, for instance when the caller adds the desired arrival time to departure and destination station, the system will try to process that additional information too. If time and date information is not offered spontaneously, again the system will ask explicit questions to obtain it.
Clearly, a Train Time Table Information System is an application intended to be used by the general public. Moreover, most users will call the system only occasionally, so that one cannot rely on users getting acquainted with the peculiarities of the service. Pilot experiments carried out by the Nederlandse Spoorwegen, the Dutch Railway Company, have shown that the part of the public who need Time Table information are not able nor willing to deal with a menu-based interface. Thus, there seems to be no alternative for starting an automated service with a mixed-initiative dialogue system.
To implement a Dutch version of the Train Time Table Information System a number of steps must be taken:
The POLYPHONE corpus has been instrumental in all these steps.
For the development of the phoneme based recogniser use has been made of the phonetically rich sentences in POLYPHONE. Following the approach that has proved successful for German we have started with a recogniser based on context independent phone models. The recogniser has been trained assuming that the automatic grapheme-to-phoneme transcription of the transliteration data is correct. That assumption is probably wrong to some extent: Dutch has quite some pronunciation variation at the phonemic level. At the time of this writing we are using the POLYPHONE recordings for an empirical investigation of the range of that variation. Up to now, researchers had to be content with rather subjective ideas about this crucial issue.
An essential part of the recognition engine in a Train Time Table system is a lexicon comprising phonemic representations of station names. Here too, there is non-negligible pronunciation variation. Since all station names have been read by at least five speakers, we can use the POLYPHONE recordings to make an inventory of the pronunciations. This is especially relevant for the names of the smaller stations, since pronunciation variants for larger stations can also be collected by other means.
Virtually every information dialogue contains yes/no questions. In previous applications of ASR in telephone information systems for the general public it has appeared that there is quite some variation in the way people answer these questions. Since POLYPHONE contains four yes/no questions, all to be answered spontaneously, we have a substantial amount of data to build a model of the answers.
The analysis of the answers that we have performed so far confirms the existence of substantial variation; yet, it appears that the very large majority of the expressions adhere to a simple schema, so that it is easy to build a model. We have seen a large difference between the two items for which we expected affirmative responses: Almost 93% of the subjects used a single word (e.g. ja, jawel, jazeker) to confirm the assumption that Dutch was their native language; the proportion of one word confirmations dropped to 75% for the question whether the caller was willing to participate in another recording session. Very few callers said ``no'', but the way in which they expressed their confirmation was much more varied.
83% of the subjects used a single word (e.g. nee, neen) to deny that they ever lived abroad for an extended period of time. Most of the people who used more complicated expressions did so to tell us in what foreign countries they had lived. 80% of the callers used a single word to deny that they were using a cordless phone; over 13% of the callers said they were using a cordless phone.
A detailed analysis of the more verbose answers showed that only a very small proportion of the affirmative answers contained no-words and that the same is true for negative answers and yes-words.
Another observation worth mentioning is that politeness forms like yes, sir; no ma'am were virtually absent. This may be due to the fact that the yes/no questions were located in the last part of the recording session, when the callers should be fully aware that they were talking to a computer. However, it is also possible that what we see reflects the growing casualness in the Dutch society, where ``speaking with two words'' is quickly becoming the exception rather than the rule.
All these observations confirm our expectation that the NLP module in our system should be able to handle the large majority of the yes/no expressions that will be used by the callers. In confirmation subdialogues in an information system (e.g. after the caller has given departure and/or destination station) the language model expects an affirmative expression, but negations may occur due to errors of the recogniser. The POLYPHONE corpus contains a number of examples of negations where confirmations were expected. We are working on a closer analysis of these cases, to find out whether they contain systematic syntactic structures that could help in making the language model more specific.
In previous experiments with information and reservation systems
it has appeared that - quite surprisingly - linguists do
not have accurate models of the way in which people express times and dates.
The POLYPHONE corpus contains a large number of these expressions. Currently
we are analysing the syntax of these expressions in order to build models for
use in the Train Time Table Information system. Unfortunately, the expressions
used by the POLYPHONE speakers are to a large extent determined by the way in
which the items were printed on the response sheets. No spontaneous
expressions of dates or times were obtained. This will make it very difficult
to derive reliable estimates of the relative frequency with which individual
expressions will occur.