TAILIEUCHUNG - Báo cáo khoa học: "Char_align:A Program for Aligning Parallel Texts at the Character Level"

There have been a number of recent papers on aligning parallel texts at the sentence level, ., Brown et al (1991), Gale and Church (to appear), Isabelle (1992), Kay and R/Ssenschein (to appear), Simard et al (1992), WarwickArmstrong and Russell (1990). On clean inputs, such as the Canadian Hansards, these methods have been very successful (at least 96% correct by sentence). Unfortunately, if the input is noisy (due to OCR and/or unknown markup conventions), then these methods tend to break down because the noise can make it difficult to find paragraph boundaries, let alone sentences. . | Char align A Program for Aligning Parallel Texts at the Character Level Kenneth Ward Church AT T Bell Laboratories 600 Mountain Avenue Murray Hill NJ 07974-0636 kwc@ Abstract There have been a number of recent papers on aligning parallel texts at the sentence level . Brown et al 1991 Gale and Church to appear Isabelle 1992 Kay and Rõsenschein to appear Simard et al 1992 WarwickArmstrong and Russell 1990 . On clean inputs such as the Canadian Hansards these methods have been very successful at least 96 correct by sentence . Unfortunately if the input is noisy due to OCR and or unknown markup conventions then these methods tend to break down because the noise can make it difficult to find paragraph boundaries let alone sentences. This paper describes a new program char_align that aligns texts at the character level rather than at the sentence paragraph level based on the cognate approach proposed by Simard et al. 1. Introduction Parallel texts have recently received considerable attention in machine hanslation . Brown et al 1990 bilingual lexicography . Klavans and Tzoukermann 1990 and terminology research for human translators . Isabelle 1992 . We have been most interested in the terminology application. Translators find it extremely embarrassing when store in the computer sense is translated as grocery or when magnetic fields is translated as magnetic meadows. Terminology eưors of this kind are all too common because the translator is generally not as familiar with the subject domain as the author of the source text or the readers of the target text. Parallel texts could be used to help translators overcome their lack of domain expertise by providing them with the ability to search previously translated documents for examples of potentially difficult expressions and see how they were translated in the past. While pursuing this possibility with a commercial translation organization AT T Language Line Services we discovered that we needed

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.