TAILIEUCHUNG - Báo cáo khoa học: "Computing Lattice BLEU Oracle Scores for Machine Translation"

The search space of Phrase-Based Statistical Machine Translation (PBSMT) systems can be represented under the form of a directed acyclic graph (lattice). The quality of this search space can thus be evaluated by computing the best achievable hypothesis in the lattice, the so-called oracle hypothesis. For common SMT metrics, this problem is however NP-hard and can only be solved using heuristics. In this work, we present two new methods for efficiently computing BLEU oracles on lattices: the first one is based on a linear approximation of the corpus BLEU score and is solved using the FST formalism; the second. | Computing Lattice BLEU Oracle Scores for Machine Translation Artem Sokolov Guillaume Wisniewski Francois Yvon LIMSI-CNRS Univ. Paris Sud BP-133 91 403 Orsay France @ Abstract The search space of Phrase-Based Statistical Machine Translation PBSMT systems can be represented under the form of a directed acyclic graph lattice . The quality of this search space can thus be evaluated by computing the best achievable hypothesis in the lattice the so-called oracle hypothesis. For common SMT metrics this problem is however NP-hard and can only be solved using heuristics. In this work we present two new methods for efficiently computing BLEU oracles on lattices the first one is based on a linear approximation of the corpus BLEU score and is solved using the FST formalism the second one relies on integer linear programming formulation and is solved directly and using the Lagrangian relaxation framework. These new decoders are positively evaluated and compared with several alternatives from the literature for three language pairs using lattices produced by two PBSMT systems. 1 Introduction The search space of Phrase-Based Statistical Machine Translation PBSMT systems has the form of a very large directed acyclic graph. In several softwares an approximation of this search space can be outputted either as a n-best list containing the n top hypotheses found by the decoder or as a phrase or word graph lattice which compactly encodes those hypotheses that have survived search space pruning. Lattices usually contain much more hypotheses than n-best lists and better approximate the search space. Exploring the PBSMT search space is one of the few means to perform diagnostic analysis and to better understand the behavior of the system Turchi et al. 2008 Auli et al. 2009 . Useful diagnostics are for instance provided by looking at the best oracle hypotheses contained in the search space those hypotheses that have the highest quality score with respect to .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.