TAILIEUCHUNG - Báo cáo khoa học: "Modeling with Structures in Statistical Machine Translation"

Most statistical machine translation systems employ a word-based alignment model. In this paper we demonstrate that word-based alignment is a major cause of translation errors. We propose a new alignment model based on shallow phrase structures, and the structures can be automatically acquired from parallel corpus. This new model achieved over 10% error reduction for our spoken language translation task. | Modeling with Structures in Statistical Machine Translation Ye-Yi Wang and Alex Waibel School of Computer Science Carnegie Mellon University 5000 Forbes Avenue Pittsburgh PA 15213 USA yyw waibel Abstract Most statistical machine translation systems employ a word-based alignment model. In this paper we demonstrate that word-based alignment is a major cause of translation errors. We propose a new alignment model based on shallow phrase structures and the structures can be automatically acquired from parallel corpus. This new model achieved over 10 error reduction for our spoken language translation task. 1 Introduction Most if not all statistical machine translation systems employ a word-based alignment model Brown et al. 1993 Vogel Ney and Tillman 1996 Wang and Waibel 1997 which treats words in a sentence as independent entities and ignores the structural relationship among them. While this independence assumption works well in speech recognition it poses a major problem in our experiments with spoken language translation between a language pair with very different word orders. In this paper we propose a translation model that employs shallow phrase structures. It has the following advantages over word-based alignment Since the translation model can directly depict phrase reordering in translation it is more accurate for translation between languages with different word phrase orders. The decoder of the translation system can use the phrase information and extend hypothesis by phrases multiple words therefore it can speed up decoding. The paper is organized as follows. In section 2 the problems of word-based alignment models are discussed. To alienate these problems a new alignment model based on shallow phrase structures is introduced in section 3. In section 4 a grammar inference algorithm is presented that can automatically acquire the phrase structures used in the new model. Translation performance is then evaluated in section 5 and conclusions are .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.