Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
Mobile voice-enabled search is emerging as one of the most popular applications abetted by the exponential growth in the number of mobile devices. The automatic speech recognition (ASR) output of the voice query is parsed into several fields. Search is then performed on a text corpus or a database. In order to improve the robustness of the query parser to noise in the ASR output, in this paper, we investigate two different methods to query parsing. Both methods exploit multiple hypotheses from ASR, in the form of word confusion networks, in order to achieve tighter coupling between ASR and. | Effects of Word Confusion Networks on Voice Search Junlan Feng Srinivas Bangalore AT T Labs-Research Florham Park NJ USA junlan srini@research.att.com Abstract Mobile voice-enabled search is emerging as one of the most popular applications abetted by the exponential growth in the number of mobile devices. The automatic speech recognition ASR output of the voice query is parsed into several fields. Search is then performed on a text corpus or a database. In order to improve the robustness of the query parser to noise in the ASR output in this paper we investigate two different methods to query parsing. Both methods exploit multiple hypotheses from ASR in the form of word confusion networks in order to achieve tighter coupling between ASR and query parsing and improved accuracy of the query parser. We also investigate the results of this improvement on search accuracy. Word confusionnetwork based query parsing outperforms ASR 1-best based query-parsing by 2.7 absolute and the search performance improves by 1.8 absolute on one of our data sets. 1 Introduction Local search specializes in serving geographically constrained search queries on a structured database of local business listings. Most textbased local search engines provide two text fields the SearchTerm e.g. Best Chinese Restaurant and the LocationTerm e.g. a city state street address neighborhood etc. . Most voice-enabled local search dialog systems mimic this two-field approach and employ a two-turn dialog strategy. The dialog system solicits from the user a LocationTerm in the first turn followed by a SearchTerm in the second turn Wang et al. 2008 . Although the two-field interface has been widely accepted it has several limitations for mobile voice search. First most mobile devices are location-aware which obviates the need to specify the LocationTerm. Second it s not always straightforward for users to be aware of the distinction between these two fields. It is com mon for users to specify location .