Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Improving the Accuracy of Subcategorizations Acquired from Corpora"

Hoài Trang 62 6 pdf

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ Tải xuống

This paper presents a method of improving the accuracy of subcategorization frames (SCFs) acquired from corpora to augment existing lexicon resources. I estimate a conﬁdence value of each SCF using corpus-based statistics, and then perform clustering of SCF conﬁdencevalue vectors for words to capture cooccurrence tendency among SCFs in the lexicon. | Improving the Accuracy of Subcategorizations Acquired from Corpora Naoki Yoshinaga Department of Computer Science University of Tokyo 7-3-1 Hongo Bunkyo-ku Tokyo 113-0033 yoshinag@is.s.u-tokyo.ac.jp Abstract This paper presents a method of improving the accuracy of subcategorization frames SCFs acquired from corpora to augment existing lexicon resources. I estimate a confidence value of each SCF using corpus-based statistics and then perform clustering of SCF confidencevalue vectors for words to capture cooccurrence tendency among SCFs in the lexicon. I apply my method to SCFs acquired from corpora using lexicons of two large-scale lexicalized grammars. The resulting SCFs achieve higher precision and recall compared to SCFs obtained by naive frequency cut-off. 1 Introduction Recently a variety of methods have been proposed for acquisition of subcategorization frames SCFs from corpora surveyed in Korhonen 2002 . One interesting possibility is to use these techniques to improve the coverage of existing large-scale lexicon resources such as lexicons of lexi-calized grammars. However there has been little work on evaluating the impact of acquired SCFs with the exception of Carroll and Fang 2004 . The problem when we integrate acquired SCFs into existing lexicalized grammars is lower quality of the acquired SCFs since they are acquired in an unsupervised manner rather than being manually coded. If we attempt to compensate for the poor precision by being less strict in filtering out less likely SCFs then we will end up with a larger number of noisy lexical entries which is problematic for parsing with lexicalized grammars Sarkar et al. 2000 . We thus need some method of selecting the most reliable set of SCFs from the system output as demonstrated in Korhonen 2002 . In this paper I present a method of improving the accuracy of SCFs acquired from corpora in order to augment existing lexicon resources. I first estimate a confidence value that a word can have each SCF using

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Improving Word Representations via Global Context and Multiple Word Prototypes"

Báo cáo khoa học: "Improving the IBM Alignment Models Using Variational Bayes"

Báo cáo khoa học: "Improving the Use of Pseudo-Words for Evaluating Selectional Preferences"

Báo cáo khoa học: "Hierarchical Joint Learning: Improving Joint Parsing and Named Entity Recognition with Non-Jointly Labeled Data"

Báo cáo khoa học: "Improving Statistical Machine Translation with Monolingual Collocation"

Báo cáo khoa học: "A new Approach to Improving Multilingual Summarization using a Genetic Algorithm"

Báo cáo khoa học: "Diversify and Combine: Improving Word Alignment for Machine Translation on Low-Resource Languages"

Báo cáo khoa học: "Improving Chinese Semantic Role Labeling with Rich Syntactic Features"

Báo cáo khoa học: "Improving Arabic-to-English Statistical Machine Translation by Reordering Post-verbal Subjects for Alignment"

Báo cáo khoa học: "Improving Arabic Dependency Parsing with Form-based and Functional Morphological Features"

Đã phát hiện trình chặn quảng cáo AdBlock

Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.