TAILIEUCHUNG - Báo cáo khoa học: "A Topic Similarity Model for Hierarchical Phrase-based Translation"

Previous work using topic model for statistical machine translation (SMT) explore topic information at the word level. However, SMT has been advanced from word-based paradigm to phrase/rule-based paradigm. We therefore propose a topic similarity model to exploit topic information at the synchronous rule level for hierarchical phrase-based translation. | A Topic Similarity Model for Hierarchical Phrase-based Translation Xinyan Xiao Deyi Xiong- Min Zhang Qun Liu Shouxun Lin tKey Lab. of Intelligent Info. Processing Institute of Computing Technology Chinese Academy of Sciences xiaoxinyan liuqun sxlin @ - Human Language Technology Institute for Infocomm Research dyxiong mzhang @ Abstract Previous work using topic model for statistical machine translation SMT explore topic information at the word level. However SMT has been advanced from word-based paradigm to phrase rule-based paradigm. We therefore propose a topic similarity model to exploit topic information at the synchronous rule level for hierarchical phrase-based translation. We associate each synchronous rule with a topic distribution and select desirable rules according to the similarity of their topic distributions with given documents. We show that our model significantly improves the translation performance over the baseline on NIST Chinese-to-English translation experiments. Our model also achieves a better performance and a faster speed than previous approaches that work at the word level. 1 Introduction Topic model Hofmann 1999 Blei et al. 2003 is a popular technique for discovering the underlying topic structure of documents. To exploit topic information for statistical machine translation SMT researchers have proposed various topic-specific lexicon translation models Zhao and Xing 2006 Zhao and Xing 2007 Tam et al. 2007 to improve translation quality. Topic-specific lexicon translation models focus on word-level translations. Such models first estimate word translation probabilities conditioned on topics and then adapt lexical weights of phrases Corresponding author 750 by these probabilities. However the state-of-the-art SMT systems translate sentences by using sequences of synchronous rules or phrases instead of translating word by word. Since a synchronous rule is rarely factorized into individual words we believe that it .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.