TAILIEUCHUNG - Báo cáo khoa học: "AUTOMATIC ALIGNMENT IN PARALLEL CORPORA"

This paper addresses the alignment issue in the framework of exploitation of large bimultilingual corpora for translation purposes. A generic alignment scheme is proposed that can meet varying requirements of different applications. Depending on the level at which alignment is sought, appropriate surface linguistic information is invoked coupled with information about possible unit delimiters. Each text unit (sentence, clause or phrase) is represented by the sum of its content tags. The results are then fed into a dynamic programming framework that computes the optimum alignment of units. . | AUTOMATIC ALIGNMENT IN PARALLEL CORPORA Harris Papageorgiou Lambros Cranias Stelios Piperidis Institute for Language and speech Processing 22 Margari Street 115 25 Athens Greece ABSTRACT This paper addresses the alignment issue in the framework of exploitation of large bi-multilingual corpora for translation purposes. A generic alignment scheme is proposed that can meet varying requữements of different applications. Depending on the level at which alignment is sought appropriate surface linguistic information is invoked coupled with information about possible unit delimiters. Each text unit sentence clause or phrase is represented by the sum of its content tags. The results are then fed into a dynamic programming framework that computes the optimum alignment of units. The proposed scheme has been tested at sentence level on parallel corpora of the CELEX database. The success rate exceeded 99 . The next steps of the work concern the testing of the scheme s efficiency at lower levels endowed with necessary bilingual information about potential delimiters. INTRODUCTION Parallel linguistically meaningful text units are indispensable in a number of NLP and lexicographic applications and recently in the so called Example-Based Machine Translation EBMT . As regards EBMT a large amount of bi-multilingual translation examples is stored in a database and input expressions are rendered in the target language by retrieving from the database that example which is most similar to the input. A task of crucial importance in this framework is the establishment of correspondences between units of multilingual texts at sentence phrase or even word level. The adopted criteria for ascertaining the adequacy of alignment methods are stated as follows 1This research was supported by the LRE I TRANSLEARN project of the European Union an alignment scheme must cope with the embedded extra-linguistic data tables anchor points SGML markers etc and theữ possible .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.