Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Learning Phrase-Based Spelling Error Models from Clickthrough Data"

Ðình Nam 67 9 pdf

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ Tải xuống

This paper explores the use of clickthrough data for query spelling correction. First, large amounts of query-correction pairs are derived by analyzing users' query reformulation behavior encoded in the clickthrough data. Then, a phrase-based error model that accounts for the transformation probability between multi-term phrases is trained and integrated into a query speller system. | Learning Phrase-Based Spelling Error Models from Clickthrough Data Xu Sun Dept. of Mathematical Informatics University of Tokyo Tokyo Japan xusun@mist.i.u-tokyo.ac.jp Daniel Micol Microsoft Corporation Munich Germany danielmi@microsoft.com Jianfeng Gao Microsoft Research Redmond WA USA jfgao@microsoft.com Chris Quirk Microsoft Research Redmond WA USA chrisq@microsoft.com Abstract This paper explores the use of clickthrough data for query spelling correction. First large amounts of query-correction pairs are derived by analyzing users query reformulation behavior encoded in the clickthrough data. Then a phrase-based error model that accounts for the transformation probability between multi-term phrases is trained and integrated into a query speller system. Experiments are carried out on a human-labeled data set. Results show that the system using the phrase-based error model outperforms significantly its baseline systems. 1 Introduction Search queries present a particular challenge for traditional spelling correction methods for three main reasons Ahmad and Kondrak 2004 . First spelling errors are more common in search queries than in regular written text roughly 10-15 of queries contain misspelled terms Cucerzan and Brill 2004 . Second most search queries consist of a few key words rather than grammatical sentences making a grammar-based approach inappropriate. Most importantly many queries contain search terms such as proper nouns and names which are not well established in the language. For example Chen et al. 2007 reported that 16.5 of valid search terms do not occur in their 200K-entry spelling lexicon. Therefore recent research has focused on the use of Web corpora and query logs rather than human-compiled lexicons to infer knowledge about misspellings and word usage in search queries e.g. Whitelaw et al. 2009 . Another important data source that would be useful for this purpose is clickthrough data. Although it is well-known that clickthrough data contain .

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Learning Condensed Feature Representations from Large Unsupervised Data Sets for Supervised Learning"

Báo cáo khoa học: "Learning Better Data Representation using Inference-Driven Metric Learning"

Báo cáo khoa học: "A Combination of Active Learning and Semi-supervised Learning Starting with Positive and Unlabeled Examples for Word Sense Disambiguation: An Empirical Study on Japanese Web Search Query"

B.A Thesis: English major students’ difficulties and expectations in learning written translation at Dong Thap university

Báo cáo đề tài nghiên cứu khoa học cấp trường: Áp dụng mô hình học tập Blended Learning trong giảng dạy học phần Basic IELTS 1 cho sinh viên theo chương trình đào tạo chất lượng cao năm thứ nhất trường Đại học Thương mại

Báo cáo đề tài nghiên cứu khoa học cấp trường: Nâng cao động lực học tiếng Anh cho sinh viên thông qua phương pháp học theo dự án (project-based learning)

Báo cáo đề tài nghiên cứu khoa học cấp trường: Nghiên cứu một số thuật toán học máy (machine learning) ứng dụng cho bài toán xác định các chủ đề quan tâm của khách hàng trực tuyến

Báo cáo khoa học: "Applications of GPC Rules and Character Structures in Games for Learning Chinese Characters"

Báo cáo khoa học: "Learning and Translating by Machines"

Báo cáo khoa học: "Discriminative Learning for Joint Template Filling"

Đã phát hiện trình chặn quảng cáo AdBlock

Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.