TAILIEUCHUNG - Báo cáo khoa học: "Corpus-Oriented Development of Japanese HPSG Parsers"

This paper reports the corpus-oriented development of a wide-coverage Japanese HPSG parser. We first created an HPSG treebank from the EDR corpus by using heuristic conversion rules, and then extracted lexical entries from the treebank. The grammar developed using this method attained wide coverage that could hardly be obtained by conventional manual development. We also trained a statistical parser for the grammar on the treebank, and evaluated the parser in terms of the accuracy of semantic-role identification and dependency analysis. . | Corpus-Oriented Development of Japanese HPSG Parsers Kazuhiro Yoshida Department of Computer Science University of Tokyo 7-3-1 Hongo Bunkyo-ku Tokyo 113-0033 kyoshida@ Abstract This paper reports the corpus-oriented development of a wide-coverage Japanese HPSG parser. We first created an HPSG treebank from the EDR corpus by using heuristic conversion rules and then extracted lexical entries from the treebank. The grammar developed using this method attained wide coverage that could hardly be obtained by conventional manual development. We also trained a statistical parser for the grammar on the treebank and evaluated the parser in terms of the accuracy of semantic-role identification and dependency analysis. 1 Introduction In this study we report the corpus-oriented development of a Japanese HPSG parser using the EDR Japanese corpus 2002 . Although several researchers have attempted to utilize linguistic grammar theories such as LFG Bresnan and Kaplan 1982 CCG Steedman 2001 and HPSG Pollard and Sag 1994 for parsing real-world texts such attempts could hardly be successful because manual development of wide-coverage linguistically motivated grammars involves years of labor-intensive effort. Corpus-oriented grammar development is a grammar development method that has been proposed as a promising substitute for conventional manual development. In corpus-oriented methods a treebank of a target grammar is constructed first and various grammatical constraints are extracted from the treebank. Previous studies reported that wide-coverage grammars can be obtained at low cost by using this method. Hockenmaier and Steedman 2002 Miyao et al. 2004 The treebank can also be used for training statistical disambiguation models and hence we can construct a statistical parser for the extracted grammar. The corpus-oriented method enabled us to develop a Japanese HPSG parser with semantic information whose coverage on real-world sentences is . This high coverage

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.