TAILIEUCHUNG - Báo cáo khoa học: "Alternative Approaches for Generating Bodies of Grammar Rules"

We compare two approaches for describing and generating bodies of rules used for natural language parsing. In today’s parsers rule bodies do not exist a priori but are generated on the fly, usually with methods based on n-grams, which are one particular way of inducing probabilistic regular languages. We compare two approaches for inducing such languages. One is based on n-grams, the other on minimization of the Kullback-Leibler divergence. | Alternative Approaches for Generating Bodies of Grammar Rules Gabriel Infante-Lopez and Maarten de Rijke Informatics Institute University of Amsterdam infante mdr @ Abstract We compare two approaches for describing and generating bodies of rules used for natural language parsing. In today s parsers rule bodies do not exist a priori but are generated on the fly usually with methods based on n-grams which are one particular way of inducing probabilistic regular languages. We compare two approaches for inducing such languages. One is based on n-grams the other on minimization of the Kullback-Leibler divergence. The inferred regular languages are used for generating bodies of rules inside a parsing procedure. We compare the two approaches along two dimensions the quality of the probabilistic regular language they produce and the performance of the parser they were used to build. The second approach outperforms the first one along both dimensions. 1 Introduction N-grams have had a big impact on the state of the art in natural language parsing. They are central to many parsing models Charniak 1997 Collins 1997 2000 Eisner 1996 and despite their simplicity n-gram models have been very successful. Modeling with n-grams is an induction task Gold 1967 . Given a sample set of strings the task is to guess the grammar that produced that sample. Usually the grammar is not be chosen from an arbitrary set of possible grammars but from a given class. Hence grammar induction consists of two parts choosing the class of languages amongst which to search and designing the procedure for performing the search. By using n-grams for grammar induction one addresses the two parts in one go. In particular the use of n-grams implies that the solution will be searched for in the class of probabilistic regular languages since n-grams induce probabilistic automata and consequently probabilistic regular languages. However the class of probabilistic regular languages induced using .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.