TAILIEUCHUNG - Báo cáo khoa học: "Aligning words using matrix factorisation"

Aligning words from sentences which are mutual translations is an important problem in different settings, such as bilingual terminology extraction, Machine Translation, or projection of linguistic features. Here, we view word alignment as matrix factorisation. In order to produce proper alignments, we show that factors must satisfy a number of constraints such as orthogonality. | Aligning words using matrix factorisation Cyril Goutte Kenji Yamada and Eric Gaussier Xerox Research Centre Europe 6 chemin de Maupertuis F-38240 Meylan France Abstract Aligning words from sentences which are mutual translations is an important problem in different settings such as bilingual terminology extraction Machine Translation or projection of linguistic features. Here we view word alignment as matrix factorisation. In order to produce proper alignments we show that factors must satisfy a number of constraints such as orthogonality. We then propose an algorithm for orthogonal non-negative matrix factorisation based on a probabilistic model of the alignment data and apply it to word alignment. This is illustrated on a French-English alignment task from the Hansard. 1 Introduction Aligning words from mutually translated sentences in two different languages is an important and difficult problem. It is important because a word-aligned corpus is typically used as a first step in order to identify phrases or templates in phrase-based Machine Translation Och et al. 1999 Tillmann and Xia 2003 Koehn et al. 2003 sec. 3 or for projecting linguistic annotation across languages Yarowsky et al. 2001 . Obtaining a word-aligned corpus usually involves training a word-based translation models Brown et al. 1993 in each directions and combining the resulting alignments. Besides processing time important issues are completeness and propriety of the resulting alignment and the ability to reliably identify general N-to-M alignments. In the following section we introduce the problem of aligning words from a corpus that is already aligned at the sentence level. We show how this problem may be phrased in terms of matrix factorisation. We then identify a number of constraints on word alignment show that these constraints entail that word alignment is equivalent to orthogonal non-negative matrix factorisation and we give a novel .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.