TAILIEUCHUNG - Báo cáo khoa học: "Disambiguating Temporal–Contrastive Discourse Connectives for Machine Translation"

Temporal–contrastive discourse connectives (although, while, since, etc.) signal various types of relations between clauses such as temporal, contrast, concession and cause. They are often ambiguous and therefore difficult to translate from one language to another. | Disambiguating Temporal-Contrastive Discourse Connectives for Machine Translation Thomas Meyer Idiap Research Institute Martigny Switzerland EPFL - EDEE doctoral school Lausanne Switzerland Abstract Temporal-contrastive discourse connectives although while since etc. signal various types of relations between clauses such as temporal contrast concession and cause. They are often ambiguous and therefore difficult to translate from one language to another. We discuss several new and translation-oriented experiments for the disambiguation of a specific subset of discourse connectives in order to correct some of the translation errors made by current statistical machine translation systems. 1 Introduction The probabilistic phrase-based models used in statistical machine translation SMT have been improved by integrating linguistic information during training stages. Recent attempts include for example the reordering of the source language syntax in order to align it closer to the target language word order Collins et al. 2010 or the tagging of pronouns for grammatical gender agreement Le Na-gard and Koehn 2010 . On the other hand integrating discourse information such as discourse relations holding between two spans of text or between sentences has not yet been applied to SMT. This paper describes several disambiguation and translation experiments for a specific subset of discourse connectives. Based on examinations in multilingual corpora we identified the connectives although but however meanwhile since though when and while as being particularly problematic for machine translation. These discourse connectives 46 signal various types of relations between clauses such as temporal contrast concession expansion cause and condition which are as we also show hard to annotate even by humans. Disambiguating these senses and tagging them in large corpora is hypothesized to help in improving SMT systems to avoid translation errors. The paper is organized .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.