TAILIEUCHUNG - Báo cáo khoa học: "Improved Modeling of Out-Of-Vocabulary Words Using Morphological Classes"

We present a class-based language model that clusters rare words of similar morphology together. The model improves the prediction of words after histories containing outof-vocabulary words. The morphological features used are obtained without the use of labeled data. The perplexity improvement compared to a state of the art Kneser-Ney model is 4% overall and 81% on unknown histories. | Improved Modeling of Out-Of-Vocabulary Words Using Morphological Classes Thomas Muller and Hinrich Schutze Institute for Natural Language Processing University of Stuttgart Germany muellets@ Abstract We present a class-based language model that clusters rare words of similar morphology together. The model improves the prediction of words after histories containing out-of-vocabulary words. The morphological features used are obtained without the use of labeled data. The perplexity improvement compared to a state of the art Kneser-Ney model is 4 overall and 81 on unknown histories. 1 Introduction One of the challenges in statistical language modeling are words that appear in the recognition task at hand but not in the training set so called out-of-vocabulary OOV words. Especially for productive language it is often necessary to at least reduce the number of OOVs. We present a novel approach based on morphological classes to handling OOV words in language modeling for English. Previous work on morphological classes in English has not been able to show noticeable improvements in perplexity. In this article class-based language models as proposed by Brown et al. 1992 are used to tackle the problem. Our model improves perplexity of a Kneser-Ney KN model for English by 4 the largest improvement of a state-of-the-art model for English due to morphological modeling that we are aware of. A class-based language model groups words into classes and replaces the word transition probability by a class transition probability and a word emission probability P W3 W1W2 P C3IC1C2 P W3IC3 . 1 524 Brown et al. and many other authors primarily use context information for clustering. Niesler et al. 1998 showed that context clustering works better than clusters based on part-of-speech tags. However since the context of an OOV word is unknown and it therefore cannot be assigned to a cluster OOV words are as much a problem to a context-based class model as to a word .

Mộng Vy 51 5 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Improved Modeling of Out-Of-Vocabulary Words Using Morphological Classes"

5 41 0

Báo cáo khoa học: "Discriminative Lexicon Adaptation for Improved Character Accuracy – A New Direction in Chinese Language Modeling"

9 54 0

Báo cáo khoa học: "Pronunciation Modeling for Improved Spelling Correction"

8 51 0

Using an improved joint normal transform method for modeling stochastic dependence in power system

6 84 0

Structure-based engineering of heparinase I with improved specific activity for degrading heparin

12 41 0

CONFOLD2: Improved contact-driven ab initio protein structure modeling

5 54 1

Improved homology modeling of the human & rat EP4 prostanoid receptors

14 19 1

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462343 61

Giới thiệu :Lập trình mã nguồn mở

14 26232 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11352 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10553 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9844 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8892 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8508 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8101 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7786 1798

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7279 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

báo cáo hóa học:" Increased androgen receptor expression in serous carcinoma of the ovary is associated with an improved survival"

6 156 3 31-12-2024

Báo cáo nghiên cứu nông nghiệp " Field control of pest fruit flies in Vietnam "

14 192 4 31-12-2024

báo cáo hóa học:" Perceptions of rewards among volunteer caregivers of people living with AIDS working in faith-based organizations in South Africa: a qualitative study"

10 157 1 31-12-2024

báo cáo hóa học:" Quality of data collection in a large HIV observational clinic database in sub-Saharan Africa: implications for clinical research and audit of care"

7 154 4 31-12-2024

Báo cáo nghiên cứu khoa học " Vai trò chính quyền địa phương trong phát triển kinh tế : khu chuyên doanh gốm sứ ( Trung Quốc ) và Bát Tràng ( Việt Nam )("

11 215 1 31-12-2024

báo cáo khoa học: "Malignant peripheral nerve sheath tumor arising from the greater omentum: Case report"

4 142 1 31-12-2024

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining

101 142 1 31-12-2024

Báo cáo khoa học: "A rare coexistence of adrenal cavernous hemangioma with extramedullar hemopoietic tissue: a case report and brief review of the literature"

4 106 0 31-12-2024

Báo cáo khoa học: "Tongue carcinoma in an adult Down's syndrome patient: a case report"

4 135 0 31-12-2024

Giáo trình môn cầu đường

26 143 2 31-12-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8101 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7786 1798

Ebook Chào con ba mẹ đã sẵn sàng

112 4412 1374

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6322 1274

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8892 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3846 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3921 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4724 566

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11352 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4511 490