TAILIEUCHUNG - Báo cáo khoa học: "Low-cost, High-performance Translation Retrieval: Dumber is Better"

In this paper, we compare the relative effects of segment order, segmentation and segment contiguity on the retrieval performance of a translation memory system. We take a selection of both bag-of-words and segment order-sensitive string comparison methods, and run each over both characterand word-segmented data, in combination with a range of local segment contiguity models (in the form of N-grams). | Low-cost High-performance Translation Retrieval Dumber is Better Timothy Baldwin Department of Computer Science Tokyo Institute of Technology 2-12-1 O-okayama Meguro-ku Tokyo 152-8552 JAPAN tim@ Abstract In this paper we compare the relative effects of segment order segmentation and segment contiguity on the retrieval performance of a translation memory system. We take a selection of both bag-of-words and segment order-sensitive string comparison methods and run each over both character-and word-segmented data in combination with a range of local segment contiguity models in the form of N-grams . Over two distinct datasets we find that indexing according to simple character bigrams produces a retrieval accuracy superior to any of the tested word Ngram models. Further in their optimum configuration bag-of-words methods are shown to be equivalent to segment ordersensitive methods in terms of retrieval accuracy but much faster. We also provide evidence that our findings are scalable. 1 Introduction Translation memories TMs are a list of translation records source language strings paired with a unique target language translation which the TM system accesses in suggesting a list of target language L2 translation candidates for a given source language L1 input Trujillo 1999 Planas 1998 . Translation retrieval TR is a description of this process of selecting from the TM a set of translation records TRecs of maximum L1 similarity to a given input. Typically in example-based machine translation either a single TRec is retrieved from the TM based on a match with the overall L1 input or the input is partitioned into coherent segments and individual translations retrieved for each Sato and Nagao 1990 Nirenburg et al. 1993 this is the first step toward generating a customised translation for the input. With stand-alone TM systems on the other hand the system selects an arbitrary number of translation candidates falling within a certain empirical corridor of .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.