TAILIEUCHUNG - Báo cáo khoa học: "Unsupervised Search for The Optimal Segmentation for Statistical Machine Translation"

We tackle the previously unaddressed problem of unsupervised determination of the optimal morphological segmentation for statistical machine translation (SMT) and propose a segmentation metric that takes into account both sides of the SMT training corpus. We formulate the objective function as the posterior probability of the training corpus according to a generative segmentation-translation model. We describe how the IBM Model-1 translation likelihood can be computed incrementally between adjacent segmentation states for efficient computation. . | Unsupervised Search for The Optimal Segmentation for Statistical Machine Translation Co kun Mermer1 3 and Ahmet Afsin Akin2 3 1 Bogazici University Bebek Istanbul Turkey 2Istanbul Technical University Sariyer Istanbul Turkey 3TUBITAK-UEKAE Gebze Kocaeli Turkey coskun ahmetaa @ Abstract We tackle the previously unaddressed problem of unsupervised determination of the optimal morphological segmentation for statistical machine translation SMT and propose a segmentation metric that takes into account both sides of the SMT training corpus. We formulate the objective function as the posterior probability of the training corpus according to a generative segmentation-translation model. We describe how the IBM Model-1 translation likelihood can be computed incrementally between adjacent segmentation states for efficient computation. Submerging the proposed segmentation method in a SMT task from morphologically-rich Turkish to English does not exhibit the expected improvement in translation BLEU scores and confirms the robustness of phrase-based SMT to translation unit combinatorics. A positive outcome of this work is the described modification to the sequential search algorithm of Morfessor Creutz and Lagus 2007 that enables arbitrary-fold parallelization of the computation which unexpectedly improves the translation performance as measured by BLEU. 1 Introduction In statistical machine translation SMT words are normally considered as the building blocks of translation models. However especially for morphologically complex languages such as Finnish Turkish Czech Arabic etc. it has been shown that using sub-lexical units obtained after morphological preprocessing can improve the machine translation performance over a word-based system Habash and Sadat 2006 Oflazer and Durgar El-Kahlout 2007 Bisazza and Federico 2009 . However the effect of segmentation on transla tion performance is indirect and difficult to isolate Lopez and Resnik 2006 . The challenge .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.