TAILIEUCHUNG - Báo cáo khoa học: "Using Search-Logs to Improve Query Tagging"

Syntactic analysis of search queries is important for a variety of information-retrieval tasks; however, the lack of annotated data makes training query analysis models difficult. We propose a simple, efficient procedure in which part-of-speech tags are transferred from retrieval-result snippets to queries at training time. | Using Search-Logs to Improve Query Tagging Kuzman Ganchev Keith Hall Ryan McDonald Slav Petrov Google Inc. kuzman kbhall ryanmcd slav @ Abstract Syntactic analysis of search queries is important for a variety of information-retrieval tasks however the lack of annotated data makes training query analysis models difficult. We propose a simple efficient procedure in which part-of-speech tags are transferred from retrieval-result snippets to queries at training time. Unlike previous work our final model does not require any additional resources at run-time. Compared to a state-of-the-art approach we achieve more than 20 relative error reduction. Additionally we annotate a corpus of search queries with part-of-speech tags providing a resource for future work on syntactic query analysis. 1 Introduction Syntactic analysis of search queries is important for a variety of tasks including better query refinement improved matching and better ad targeting Barr et al. 2008 . However search queries differ substantially from traditional forms of written language . no capitalization few function words fairly free word order etc. and are therefore difficult to process with natural language processing tools trained on standard corpora Barr et al. 2008 . In this paper we focus on part-of-speech POS tagging queries entered into commercial search engines and compare different strategies for learning from search logs. The search logs consist of user queries and relevant search results retrieved by a search engine. We use a supervised POS tagger to label the result snippets and then transfer the tags to the queries producing a set of noisy labeled queries. These labeled queries are then added to the training data and 238 the tagger is retrained. We evaluate different strategies for selecting which annotation to transfer and find that using the result that was clicked by the user gives comparable performance to using just the top result or to aggregating over the top-k .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.