TAILIEUCHUNG - Báo cáo khoa học: "Issues Concerning Decoding with Synchronous Context-free Grammar"

We discuss some of the practical issues that arise from decoding with general synchronous context-free grammars. We examine problems caused by unary rules and we also examine how virtual nonterminals resulting from binarization can best be handled. We also investigate adding more flexibility to synchronous context-free grammars by adding glue rules and phrases. | Issues Concerning Decoding with Synchronous Context-free Grammar Tagyoung Chung Licheng Fang and Daniel Gildea Department of Computer Science University of Rochester Rochester NY 14627 Abstract We discuss some of the practical issues that arise from decoding with general synchronous context-free grammars. We examine problems caused by unary rules and we also examine how virtual nonterminals resulting from binarization can best be handled. We also investigate adding more flexibility to synchronous context-free grammars by adding glue rules and phrases. 1 Introduction Synchronous context-free grammar SCFG is widely used for machine translation. There are many different ways to extract SCFGs from data. Hiero Chiang 2005 represents a more restricted form of SCFG while GHKM Galley et al. 2004 uses a general form of SCFG. In this paper we discuss some of the practical issues that arise from decoding general SCFGs that are seldom discussed in the literature. We focus on parsing grammars extracted using the method put forth by Galley et al. 2004 but the solutions to these issues are applicable to other general forms of SCFG with many nonterminals. The GHKM grammar extraction method produces a large number of unary rules. Unary rules are the rules that have exactly one nonterminal and no terminals on the source side. They may be problematic for decoders since they may create cycles which are unary production chains that contain duplicated dynamic programming states. In later sections we discuss why unary rules are problematic and investigate two possible solutions. 413 GHKM grammars often have rules with many right-hand-side nonterminals and require binarization to ensure O n3 time parsing. However binarization creates a large number of virtual nonterminals. We discuss the challenges of and possible solutions to issues arising from having a large number of virtual nonterminals. We also compare binarizing the grammar with filtering rules according to scope a concept .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.