TAILIEUCHUNG - Báo cáo khoa học: "Learning Phrase-Based Spelling Error Models from Clickthrough Data"

This paper explores the use of clickthrough data for query spelling correction. First, large amounts of query-correction pairs are derived by analyzing users' query reformulation behavior encoded in the clickthrough data. Then, a phrase-based error model that accounts for the transformation probability between multi-term phrases is trained and integrated into a query speller system. | Learning Phrase-Based Spelling Error Models from Clickthrough Data Xu Sun Dept. of Mathematical Informatics University of Tokyo Tokyo Japan xusun@ Daniel Micol Microsoft Corporation Munich Germany danielmi@ Jianfeng Gao Microsoft Research Redmond WA USA jfgao@ Chris Quirk Microsoft Research Redmond WA USA chrisq@ Abstract This paper explores the use of clickthrough data for query spelling correction. First large amounts of query-correction pairs are derived by analyzing users query reformulation behavior encoded in the clickthrough data. Then a phrase-based error model that accounts for the transformation probability between multi-term phrases is trained and integrated into a query speller system. Experiments are carried out on a human-labeled data set. Results show that the system using the phrase-based error model outperforms significantly its baseline systems. 1 Introduction Search queries present a particular challenge for traditional spelling correction methods for three main reasons Ahmad and Kondrak 2004 . First spelling errors are more common in search queries than in regular written text roughly 10-15 of queries contain misspelled terms Cucerzan and Brill 2004 . Second most search queries consist of a few key words rather than grammatical sentences making a grammar-based approach inappropriate. Most importantly many queries contain search terms such as proper nouns and names which are not well established in the language. For example Chen et al. 2007 reported that of valid search terms do not occur in their 200K-entry spelling lexicon. Therefore recent research has focused on the use of Web corpora and query logs rather than human-compiled lexicons to infer knowledge about misspellings and word usage in search queries . Whitelaw et al. 2009 . Another important data source that would be useful for this purpose is clickthrough data. Although it is well-known that clickthrough data contain .

TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG
41    172    5    23-11-2024
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.