TAILIEUCHUNG - Báo cáo khoa học: "Measuring Contextual Fitness Using Error Contexts Extracted from the Wikipedia Revision History"

We evaluate measures of contextual fitness on the task of detecting real-word spelling errors. For that purpose, we extract naturally occurring errors and their contexts from the Wikipedia revision history. We show that such natural errors are better suited for evaluation than the previously used artificially created errors. In particular, the precision of statistical methods has been largely over-estimated, while the precision of knowledge-based approaches has been under-estimated. Additionally, we show that knowledge-based approaches can be improved by using semantic relatedness measures that make use of knowledge beyond classical taxonomic relations. Finally, we show that statistical and knowledgebased methods. | Measuring Contextual Fitness Using Error Contexts Extracted from the Wikipedia Revision History Torsten Zesch Ubiquitous Knowledge Processing Lab UKP-DIPF German Institute for Educational Research and Educational Information Frankfurt Ubiquitous Knowledge Processing Lab UKP-TUDA Department of Computer Science Technische Universitat Darmstadt http Abstract We evaluate measures of contextual fitness on the task of detecting real-word spelling errors. For that purpose we extract naturally occurring errors and their contexts from the Wikipedia revision history. We show that such natural errors are better suited for evaluation than the previously used artificially created errors. In particular the precision of statistical methods has been largely over-estimated while the precision of knowledge-based approaches has been under-estimated. Additionally we show that knowledge-based approaches can be improved by using semantic relatedness measures that make use of knowledge beyond classical taxonomic relations. Finally we show that statistical and knowledgebased methods can be combined for increased performance. 1 Introduction Measuring the contextual fitness of a term in its context is a key component in different NLP applications like speech recognition Inkpen and Desilets 2005 optical character recognition Wick et al. 2007 co-reference resolution Bean and Riloff 2004 or malapropism detection Bolshakov and Gelbukh 2003 . The main idea is always to test what fits better into the current context the actual term or a possible replacement that is phonetically structurally or semantically similar. We are going to focus on malapropism detection as it allows evaluating measures of contextual fitness in a more direct way than evaluating in a complex application which always entails influence from other components . the quality of the optical character recognition module Walker et al. 2010 . A malapropism or real-word spelling error occurs when a word is .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.