Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
Synchronous Context-Free Grammars (SCFGs) have been successfully exploited as translation models in machine translation applications. When parsing with an SCFG, computational complexity grows exponentially with the length of the rules, in the worst case. In this paper we examine the problem of factorizing each rule of an input SCFG to a generatively equivalent set of rules, each having the smallest possible length. Our algorithm works in time O(n log n), for each rule of length n. This improves upon previous results and solves an open problem about recognizing permutations that can be factored. . | Factoring Synchronous Grammars By Sorting Daniel Gildea Computer Science Dept. University of Rochester Rochester NY 14627 Giorgio Satta Dept. of Information Eng g University of Padua I-35131 Padua Italy Hao Zhang Computer Science Dept. University of Rochester Rochester NY 14627 Abstract Synchronous Context-Free Grammars SCFGs have been successfully exploited as translation models in machine translation applications. When parsing with an SCFG computational complexity grows exponentially with the length of the rules in the worst case. In this paper we examine the problem of factorizing each rule of an input SCFG to a generatively equivalent set of rules each having the smallest possible length. Our algorithm works in time O n log n for each rule of length n. This improves upon previous results and solves an open problem about recognizing permutations that can be factored. 1 Introduction Synchronous Context-Free Grammars SCFGs are a generalization of the Context-Free Grammar CFG formalism to simultaneously produce strings in two languages. SCFGs have a wide range of applications including machine translation word and phrase alignments and automatic dictionary construction. Variations of SCFGs go back to Aho and Ullman 1972 s Syntax-Directed Translation Schemata but also include the Inversion Transduction Grammars in Wu 1997 which restrict grammar rules to be binary the synchronous grammars in Chiang 2005 which use only a single nonterminal symbol and the Multitext Grammars in Melamed 2003 which allow independent rewriting as well as other tree-based models such as Yamada and Knight 2001 and Galley et al. 2004 . When viewed as a rewriting system an SCFG generates a set of string pairs representing some translation relation. We are concerned here with the time complexity of parsing such a pair according to the grammar. Assume then a pair with each string having a maximum length of N and consider an SCFG G of size G with a bound of n nonterminals in the right-hand side .