Abstract
This paper describes the evaluation methodology and results of the DARPA Communicator spoken dialog system evaluation experiments in 2000 and 2001. Nine spoken dialog systems in the travel planning domain participated in the experiments resulting in a total corpus of 1904 dialogs. We describe and compare the experimental design of the 2000 and 2001 DARPA evaluations. We describe how we established a performance baseline in 2001 for complex tasks. We present our overall approach to data collection, the metrics collected, and the application of PARADISE to these data sets. We compare the results we achieved in 2000 for a number of core metrics with those for 2001. These results demonstrate large performance improvements from 2000 to 2001 and show that the Communicator program goal of conversational interaction for complex tasks has been achieved.
Original language | English (US) |
---|---|
Pages | 273-276 |
Number of pages | 4 |
State | Published - 2002 |
Event | 7th International Conference on Spoken Language Processing, ICSLP 2002 - Denver, United States Duration: Sep 16 2002 → Sep 20 2002 |
Other
Other | 7th International Conference on Spoken Language Processing, ICSLP 2002 |
---|---|
Country/Territory | United States |
City | Denver |
Period | 9/16/02 → 9/20/02 |
All Science Journal Classification (ASJC) codes
- Language and Linguistics
- Linguistics and Language