Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Improved Modeling of Out-Of-Vocabulary Words Using Morphological Classes"

Mộng Vy 51 5 pdf

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ Tải xuống

We present a class-based language model that clusters rare words of similar morphology together. The model improves the prediction of words after histories containing outof-vocabulary words. The morphological features used are obtained without the use of labeled data. The perplexity improvement compared to a state of the art Kneser-Ney model is 4% overall and 81% on unknown histories. | Improved Modeling of Out-Of-Vocabulary Words Using Morphological Classes Thomas Muller and Hinrich Schutze Institute for Natural Language Processing University of Stuttgart Germany muellets@ims.uni-stuttgart.de Abstract We present a class-based language model that clusters rare words of similar morphology together. The model improves the prediction of words after histories containing out-of-vocabulary words. The morphological features used are obtained without the use of labeled data. The perplexity improvement compared to a state of the art Kneser-Ney model is 4 overall and 81 on unknown histories. 1 Introduction One of the challenges in statistical language modeling are words that appear in the recognition task at hand but not in the training set so called out-of-vocabulary OOV words. Especially for productive language it is often necessary to at least reduce the number of OOVs. We present a novel approach based on morphological classes to handling OOV words in language modeling for English. Previous work on morphological classes in English has not been able to show noticeable improvements in perplexity. In this article class-based language models as proposed by Brown et al. 1992 are used to tackle the problem. Our model improves perplexity of a Kneser-Ney KN model for English by 4 the largest improvement of a state-of-the-art model for English due to morphological modeling that we are aware of. A class-based language model groups words into classes and replaces the word transition probability by a class transition probability and a word emission probability P W3 W1W2 P C3IC1C2 P W3IC3 . 1 524 Brown et al. and many other authors primarily use context information for clustering. Niesler et al. 1998 showed that context clustering works better than clusters based on part-of-speech tags. However since the context of an OOV word is unknown and it therefore cannot be assigned to a cluster OOV words are as much a problem to a context-based class model as to a word .

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Rebanking CCGbank for improved NP interpretation"

Báo cáo khoa học: "Improved Unsupervised POS Induction through Prototype Discovery"

Báo cáo khoa học: "Improved Modeling of Out-Of-Vocabulary Words Using Morphological Classes"

Báo cáo khoa học: "ConsentCanvas: Automatic Texturing for Improved Readability in EndUser License Agreements"

Báo cáo khoa học: "Discriminative Lexicon Adaptation for Improved Character Accuracy – A New Direction in Chinese Language Modeling"

Báo cáo khoa học: "Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations"

Báo cáo khoa học: "Improved Smoothing for N-gram Language Models Based on Ordinary Counts"

Báo cáo khoa học: "Semi-Supervised Conditional Random Fields for Improved Sequence Segmentation and Labeling"

Báo cáo khoa học: "An Improved Redundancy Elimination Algorithm for Underspeciﬁed Representations"

Báo cáo khoa học: "Improved Discriminative Bilingual Word Alignment"

Đã phát hiện trình chặn quảng cáo AdBlock

Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.