TAILIEUCHUNG - Báo cáo khoa học: "Learning Phrase-Based Spelling Error Models from Clickthrough Data"

This paper explores the use of clickthrough data for query spelling correction. First, large amounts of query-correction pairs are derived by analyzing users' query reformulation behavior encoded in the clickthrough data. Then, a phrase-based error model that accounts for the transformation probability between multi-term phrases is trained and integrated into a query speller system. | Learning Phrase-Based Spelling Error Models from Clickthrough Data Xu Sun Dept. of Mathematical Informatics University of Tokyo Tokyo Japan xusun@ Daniel Micol Microsoft Corporation Munich Germany danielmi@ Jianfeng Gao Microsoft Research Redmond WA USA jfgao@ Chris Quirk Microsoft Research Redmond WA USA chrisq@ Abstract This paper explores the use of clickthrough data for query spelling correction. First large amounts of query-correction pairs are derived by analyzing users query reformulation behavior encoded in the clickthrough data. Then a phrase-based error model that accounts for the transformation probability between multi-term phrases is trained and integrated into a query speller system. Experiments are carried out on a human-labeled data set. Results show that the system using the phrase-based error model outperforms significantly its baseline systems. 1 Introduction Search queries present a particular challenge for traditional spelling correction methods for three main reasons Ahmad and Kondrak 2004 . First spelling errors are more common in search queries than in regular written text roughly 10-15 of queries contain misspelled terms Cucerzan and Brill 2004 . Second most search queries consist of a few key words rather than grammatical sentences making a grammar-based approach inappropriate. Most importantly many queries contain search terms such as proper nouns and names which are not well established in the language. For example Chen et al. 2007 reported that of valid search terms do not occur in their 200K-entry spelling lexicon. Therefore recent research has focused on the use of Web corpora and query logs rather than human-compiled lexicons to infer knowledge about misspellings and word usage in search queries . Whitelaw et al. 2009 . Another important data source that would be useful for this purpose is clickthrough data. Although it is well-known that clickthrough data contain .

TỪ KHÓA LIÊN QUAN
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.