TAILIEUCHUNG - Báo cáo khoa học: "a new text alignment architecture"

We are presenting a new, hybrid alignment architecture for aligning bilingual, linguistically annotated parallel corpora. It is able to align simultaneously at paragraph, sentence, phrase and word level, using statistical and heuristic cues, along with linguistics-based rules. The system currently aligns English and German texts, and the linguistic annotation used covers POS-tags, lemmas and syntactic constitutents. However, as the system is highly modular, we can easily adapt it to new language pairs and other types of annotation. . | ATLAS - a new text alignment architecture Bettina Schrader Institute of cognitive Science University of Osnabriick 49069 Osnabriick bschrade@ Abstract We are presenting a new hybrid alignment architecture for aligning bilingual linguistically annotated parallel corpora. It is able to align simultaneously at paragraph sentence phrase and word level using statistical and heuristic cues along with linguistics-based rules. The system currently aligns English and German texts and the linguistic annotation used covers POS-tags lemmas and syntactic constitu-tents. However as the system is highly modular we can easily adapt it to new language pairs and other types of annotation. The hybrid nature of the system allows experiments with a variety of alignment cues to find solutions to word alignment problems like the correct alignment of rare words and multiwords or how to align despite syntactic differences between two languages. First performance tests are promising and we are setting up a gold standard for a thorough evaluation of the system. 1 Introduction Aligning parallel text . automatically setting the sentences or words in one text into correspondence with their equivalents in a translation is a very useful preprocessing step for a range of applications including but not limited to machine translation Brown et al. 1993 cross-language information retrieval Hiemstra 1996 dictionary creation Smadja et al. 1996 and induction of NLP-tools Kuhn 2004 . Aligned corpora can be also be used in translation studies Neumann and Hansen-Schirra 2005 . The alignment of sentences can be done sufficiently well using cues such as sentence length Gale and Church 1993 or cognates Simard et al. 1992 . Word alignment however is almost exclusively done using statistics Brown et al. 1993 Hiemstra 1996 Vogel et al. 1999 Toutanova et al. 2002 . Hence it is difficult to align so-called rare events . tokens with a frequency below 10. This is a considerable drawback as rare events .

TÀI LIỆU MỚI ĐĂNG
28    165    1    08-01-2025
337    150    2    08-01-2025
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.