TAILIEUCHUNG - Báo cáo khoa học: "Decoding Algorithm in Statistical Machine Translation"

Decoding algorithm is a crucial part in statistical machine translation. We describe a stack decoding algorithm in this paper. We present the hypothesis scoring method and the heuristics used in our algorithm. We report several techniques deployed to improve the performance of the decoder. We also introduce a simplified model to moderate the sparse data problem and to speed up the decoding process. We evaluate and compare these techniques/models in our statistical machine translation system. | Decoding Algorithm in Statistical Machine Translation Ye-Yi Wang and Alex Waibel Language Technology Institute School of Computer Science Carnegie Mellon University 5000 Forbes Avenue Pittsburgh PA 15213 USA yyw waibel @ Abstract Decoding algorithm is a crucial part in statistical machine translation. We describe a stack decoding algorithm in this paper. We present the hypothesis scoring method and the heuristics used in our algorithm. We report several techniques deployed to improve the performance of the decoder. We also introduce a simplified model to moderate the sparse data problem and to speed up the decoding process. We evaluate and compare these techniques models in our statistical machine translation system. 1 Introduction Statistical Machine Translation Statistical machine translation is based on a channel model. Given a sentence T in one language German to be translated into another language English it considers T as the target of a communication channel and its translation s as the source of the channel. Hence the machine translation task becomes to recover the source from the target. Basically every English sentence is a possible source for a German target sentence. If we assign a probability P S I T to each pair of sentences S T then the problem of translation is to find the source s for a given target T such that P S I T is the maximum. According to Bayes rule p siT ia 1 1 Since the denominator is independent of s we have s argmaxP S P T I S 2 s Therefore a statistical machine translation system must deal with the following three problems Modeling Problem How to depict the process of generating a sentence in a source language and the process used by a channel to generate a target sentence upon receiving a source sentence The former is the problem of language modeling and the later is the problem of translation modeling. They provide a framework for calculating P S and P T I S in 2 . Learning Problem Given a statistical language model P

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.