TAILIEUCHUNG - Báo cáo khoa học: "Measuring Contextual Fitness Using Error Contexts Extracted from the Wikipedia Revision History"

We evaluate measures of contextual ﬁtness on the task of detecting real-word spelling errors. For that purpose, we extract naturally occurring errors and their contexts from the Wikipedia revision history. We show that such natural errors are better suited for evaluation than the previously used artiﬁcially created errors. In particular, the precision of statistical methods has been largely over-estimated, while the precision of knowledge-based approaches has been under-estimated. Additionally, we show that knowledge-based approaches can be improved by using semantic relatedness measures that make use of knowledge beyond classical taxonomic relations. Finally, we show that statistical and knowledgebased methods. | Measuring Contextual Fitness Using Error Contexts Extracted from the Wikipedia Revision History Torsten Zesch Ubiquitous Knowledge Processing Lab UKP-DIPF German Institute for Educational Research and Educational Information Frankfurt Ubiquitous Knowledge Processing Lab UKP-TUDA Department of Computer Science Technische Universitat Darmstadt http Abstract We evaluate measures of contextual fitness on the task of detecting real-word spelling errors. For that purpose we extract naturally occurring errors and their contexts from the Wikipedia revision history. We show that such natural errors are better suited for evaluation than the previously used artificially created errors. In particular the precision of statistical methods has been largely over-estimated while the precision of knowledge-based approaches has been under-estimated. Additionally we show that knowledge-based approaches can be improved by using semantic relatedness measures that make use of knowledge beyond classical taxonomic relations. Finally we show that statistical and knowledgebased methods can be combined for increased performance. 1 Introduction Measuring the contextual fitness of a term in its context is a key component in different NLP applications like speech recognition Inkpen and Desilets 2005 optical character recognition Wick et al. 2007 co-reference resolution Bean and Riloff 2004 or malapropism detection Bolshakov and Gelbukh 2003 . The main idea is always to test what fits better into the current context the actual term or a possible replacement that is phonetically structurally or semantically similar. We are going to focus on malapropism detection as it allows evaluating measures of contextual fitness in a more direct way than evaluating in a complex application which always entails influence from other components . the quality of the optical character recognition module Walker et al. 2010 . A malapropism or real-word spelling error occurs when a word is .

Tùng Châu 85 10 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Measuring Contextual Fitness Using Error Contexts Extracted from the Wikipedia Revision History"

10 72 0

Application of contextual approach for measuring tourism destination attractiveness

10 59 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462340 61

Giới thiệu :Lập trình mã nguồn mở

14 26020 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11345 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10550 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9841 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8889 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8504 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8100 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7735 1790

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7263 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Giáo án mầm non chương trình đổi mới: Gia đình vui nhộn

4 392 3 26-12-2024

Data Structures and Algorithms - Chapter 8: Heaps

41 188 5 26-12-2024

báo cáo hóa học:" Increased androgen receptor expression in serous carcinoma of the ovary is associated with an improved survival"

6 156 3 26-12-2024

Quy Trình Canh Tác Cây Bông Vải

8 164 3 26-12-2024

Bảng màu theo chữ cái – V

11 164 2 26-12-2024

BÀI GIẢNG Biến Đổi Năng Lượng Điện Cơ - TS. Hồ Phạm Huy

137 158 1 26-12-2024

Valve Selection Handbook - Fourth Edition

337 145 2 26-12-2024

ETHICAL CODE HANDBOOK: Demonstrate your commitment to high standards

7 147 1 26-12-2024

ĐỀ TÀI " ĐÁNH GIÁ HIỆU QUẢ HOẠT ĐỘNG KINH DOANH NGOẠI HỐI CỦA NGÂN HÀNG THƯƠNG MẠI CỔ PHẦN XUẤT NHẬP KHẨU VIỆT NAM "

51 150 3 26-12-2024

báo cáo khoa học: "Malignant peripheral nerve sheath tumor arising from the greater omentum: Case report"

4 141 1 26-12-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8100 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7735 1790

Ebook Chào con ba mẹ đã sẵn sàng

112 4406 1371

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6283 1266

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8889 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3839 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3919 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4708 565

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11345 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4508 490