TAILIEUCHUNG - Báo cáo khoa học: "Improving the Accuracy of Subcategorizations Acquired from Corpora"

This paper presents a method of improving the accuracy of subcategorization frames (SCFs) acquired from corpora to augment existing lexicon resources. I estimate a confidence value of each SCF using corpus-based statistics, and then perform clustering of SCF confidencevalue vectors for words to capture cooccurrence tendency among SCFs in the lexicon. | Improving the Accuracy of Subcategorizations Acquired from Corpora Naoki Yoshinaga Department of Computer Science University of Tokyo 7-3-1 Hongo Bunkyo-ku Tokyo 113-0033 yoshinag@ Abstract This paper presents a method of improving the accuracy of subcategorization frames SCFs acquired from corpora to augment existing lexicon resources. I estimate a confidence value of each SCF using corpus-based statistics and then perform clustering of SCF confidencevalue vectors for words to capture cooccurrence tendency among SCFs in the lexicon. I apply my method to SCFs acquired from corpora using lexicons of two large-scale lexicalized grammars. The resulting SCFs achieve higher precision and recall compared to SCFs obtained by naive frequency cut-off. 1 Introduction Recently a variety of methods have been proposed for acquisition of subcategorization frames SCFs from corpora surveyed in Korhonen 2002 . One interesting possibility is to use these techniques to improve the coverage of existing large-scale lexicon resources such as lexicons of lexi-calized grammars. However there has been little work on evaluating the impact of acquired SCFs with the exception of Carroll and Fang 2004 . The problem when we integrate acquired SCFs into existing lexicalized grammars is lower quality of the acquired SCFs since they are acquired in an unsupervised manner rather than being manually coded. If we attempt to compensate for the poor precision by being less strict in filtering out less likely SCFs then we will end up with a larger number of noisy lexical entries which is problematic for parsing with lexicalized grammars Sarkar et al. 2000 . We thus need some method of selecting the most reliable set of SCFs from the system output as demonstrated in Korhonen 2002 . In this paper I present a method of improving the accuracy of SCFs acquired from corpora in order to augment existing lexicon resources. I first estimate a confidence value that a word can have each SCF using

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.