TAILIEUCHUNG - Báo cáo khoa học: "The Effect of Translation Quality in MT-Based Cross-Language Information Retrieval"

This paper explores the relationship between the translation quality and the retrieval effectiveness in Machine Translation (MT) based Cross-Language Information Retrieval (CLIR). To obtain MT systems of different translation quality, we degrade a rule-based MT system by decreasing the size of the rule base and the size of the dictionary. We use the degraded MT systems to translate queries and submit the translated queries of varying quality to the IR system. Retrieval effectiveness is found to correlate highly with the translation quality of the queries. We further analyze the factors that affect the retrieval effectiveness. Title queries are found. | The Effect of Translation Quality in MT-Based Cross-Language Information Retrieval Jiang Zhu Haifeng Wang Toshiba China Research and Development Center 5 F. Tower W2 Oriental Plaza East Chang An Ave. Dong Cheng District Beijing 100738 China zhujiang wanghaifeng @ Abstract This paper explores the relationship between the translation quality and the retrieval effectiveness in Machine Translation MT based Cross-Language Information Retrieval CLIR . To obtain MT systems of different translation quality we degrade a rule-based MT system by decreasing the size of the rule base and the size of the dictionary. We use the degraded MT systems to translate queries and submit the translated queries of varying quality to the IR system. Retrieval effectiveness is found to correlate highly with the translation quality of the queries. We further analyze the factors that affect the retrieval effectiveness. Title queries are found to be preferred in MT-based CLIR. In addition dictionary-based degradation is shown to have stronger impact than rule-based degradation in MT-based CLIR. 1 Introduction Cross-Language Information Retrieval CLIR enables users to construct queries in one language and search the documents in another language. CLIR requires that either the queries or the documents be translated from a language into another using available translation resources. Previous studies have concentrated on query translation because it is computationally less expensive than document translation which requires a lot of processing time and storage costs Hull Grefenstette 1996 . There are three kinds of methods to perform query translation namely Machine Translation MT based methods dictionary-based methods and corpus-based methods. Corresponding to these methods three types of translation resources are required MT systems bilingual wordlists and parallel or comparable corpora. CLIR effectiveness depends on both the design of the retrieval system and the quality of

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.