Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
We cast the word alignment problem as maximizing a submodular function under matroid constraints. Our framework is able to express complex interactions between alignment components while remaining computationally efficient, thanks to the power and generality of submodular functions. We show that submodularity naturally arises when modeling word fertility. Experiments on the English-French Hansards alignment task show that our approach achieves lower alignment error rates compared to conventional matching based approaches. . | Word Alignment via Submodular Maximization over Matroids Hui Lin Dept. of Electrical Engineering University of Washington Seattle WA 98195 USA hlin@ee.washington.edu Jeff Bilmes Dept. of Electrical Engineering University of Washington Seattle Wa 98195 USA bilmes@ee.washington.edu Abstract We cast the word alignment problem as maximizing a submodular function under matroid constraints. Our framework is able to express complex interactions between alignment components while remaining computationally efficient thanks to the power and generality of submodular functions. We show that submodularity naturally arises when modeling word fertility. Experiments on the English-French Hansards alignment task show that our approach achieves lower alignment error rates compared to conventional matching based approaches. 1 Introduction Word alignment is a key component in most statistical machine translation systems. While classical approaches for word alignment are based on generative models e.g. IBM models Brown et al. 1993 and HMM Vogel et al. 1996 word alignment can also be viewed as a matching problem where each word pair is associated with a score reflecting the desirability of aligning that pair and the alignment is then the highest scored matching under some constraints. Several matching-based approaches have been proposed in the past. Melamed 2000 introduces the competitive linking algorithm which greedily constructs matchings under the one-to-one mapping assumption. In Matusov et al. 2004 matchings are found using an algorithm for constructing a maximum weighted bipartite graph matching Schrijver 2003 where word pair scores come from alignment posteriors of generative models. Similarly Taskar et al. 2005 cast word alignment as a maximum weighted matching problem and propose a 170 framework for learning word pair scores as a function of arbitrary features of that pair. These approaches however have two potentially substantial limitations words have fertility of at most .