TAILIEUCHUNG - Báo cáo khoa học: "A Portable Algorithm for Mapping Bitext Correspondence"

The first step in most empirical work in multilingual NLP is to construct maps of the correspondence between texts and their translations ( b i t e x t m a p s ) . The Smooth Injective Map Recognizer (SIMR) algorithm presented here is a generic pattern recognition algorithm that is particularly well-suited to mapping bitext correspondence. SIMR is faster and significantly more accurate than other algorithms in the literature. The algorithm is robust enough to use on noisy texts, such as those resulting from OCR input, and on translations that are not very literal. SIMR encapsulates its. | A Portable Algorithm for Mapping Bitext Correspondence I. Dan Melamed Dept of Computer and Information Science University of Pennsylvania Philadelphia PA 19104 . Abstract The first step in most empirical work in multilingual NLP is to construct maps of the correspondence between texts and their translations bitext maps . The Smooth Injective Map Recognizer SIMR algorithm presented here is a generic pattern recognition algorithm that is particularly well-suited to mapping bitext correspondence. SIMR is faster and significantly more accurate than other algorithms in the literature. The algorithm is robust enough to use on noisy texts such as those resulting from OCR input and on translations that are not very literal. SIMR encapsulates its language-specific heuristics so that it can be ported to any language pair with a minimal effort. 1 Introduction Texts that are available in two languages bitexts are immensely valuable for many natural language processing applications1. Bitexts are the raw material from which translation models are built. In addition to their use in machine translation Sato Nagao 1990 Brown et al. 1993 Melamed 1997 translation models can be applied to machine-assisted translation Sato 1992 Foster et al. 1996 cross-lingual information retrieval SIGIR 1996 and gisting of World Wide Web pages Resnik 1997 . Bitexts also play a role in less automated applications such as concordancing for bilingual lexicography Catizone et al. 1993 Gale Church 1991b computer-assisted language learning and tools for translators . Macklovitch 1 Multitexts in more than two languages axe even more valuable but they Eire much more rare. 1995 Melamed 1996b . However bitexts are of little use without an automatic method for constructing bitext maps. Bitext maps identify corresponding text units between the two halves of a bitext. The ideal bitext mapping algorithm should be fast and accurate use little memory and degrade gracefully when .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.