Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Unsupervised Lexicon-Based Resolution of Unknown Words for Full Morphological Analysis"

Gia Nghị 66 9 pdf

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ Tải xuống

Morphological disambiguation proceeds in 2 stages: (1) an analyzer provides all possible analyses for a given token and (2) a stochastic disambiguation module picks the most likely analysis in context. When the analyzer does not recognize a given token, we hit the problem of unknowns. In large scale corpora, unknowns appear at a rate of 5 to 10% (depending on the genre and the maturity of the lexicon). We address the task of computing the distribution p(t|w) for unknown words for full morphological disambiguation in Hebrew. . | Unsupervised Lexicon-Based Resolution of Unknown Words for Full Morphological Analysis Meni Adler and Yoav Goldberg and David Gabay and Michael Elhadad Ben Gurion University of the Negev Department of Computer Science POB 653 Be er Sheva 84105 Israel adlerm goldberg gabayd elhadad @cs.bgu.ac.il Abstract Morphological disambiguation proceeds in 2 stages 1 an analyzer provides all possible analyses for a given token and 2 a stochastic disambiguation module picks the most likely analysis in context. When the analyzer does not recognize a given token we hit the problem of unknowns. In large scale corpora unknowns appear at a rate of 5 to 10 depending on the genre and the maturity of the lexicon . We address the task of computing the distribution p t w for unknown words for full morphological disambiguation in Hebrew. We introduce a novel algorithm that is language independent it exploits a maximum entropy letters model trained over the known words observed in the corpus and the distribution of the unknown words in known tag contexts through iterative approximation. The algorithm achieves 30 error reduction on disambiguation of unknown words over a competitive baseline to a level of 70 accurate full disambiguation of unknown words . We have also verified that taking advantage of a strong language-specific model of morphological patterns provides the same level of disambiguation. The algorithm we have developed exploits distributional information latent in a wide-coverage lexicon and large quantities of unlabeled data. This work is supported in part by the Lynn and William Frankel Center for Computer Science. 1 Introduction The term unknowns denotes tokens in a text that cannot be resolved in a given lexicon. For the task of full morphological analysis the lexicon must provide all possible morphological analyses for any given token. In this case unknown tokens can be categorized into two classes of missing information unknown tokens are not recognized at all by the .

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Unsupervised Relation Discovery with Sense Disambiguation"

Báo cáo khoa học: "Unsupervised Semantic Role Induction with Global Role Ordering"

Báo cáo khoa học: "Towards the Unsupervised Acquisition of Discourse Relations"

Báo cáo khoa học: "Unsupervised Morphology Rivals Supervised Morphology for Arabic MT"

Báo cáo khoa học: "Smaller Alignment Models for Better Translations: Unsupervised Word Alignment with the 0"

Báo cáo khoa học: "A Statistical Model for Unsupervised and Semi-supervised Transliteration Mining"

Báo cáo khoa học: "Fully Unsupervised Core-Adjunct Argument Classiﬁcation"

Báo cáo khoa học: "Unsupervised Ontology Induction from Text"

Báo cáo khoa học: "Improved Unsupervised POS Induction through Prototype Discovery"

Báo cáo khoa học: "Unsupervised Event Coreference Resolution with Rich Linguistic Features"

Đã phát hiện trình chặn quảng cáo AdBlock

Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.