TAILIEUCHUNG - Báo cáo khoa học: "Memory-Based Learning: Using Similarity for Smoothing"

This paper analyses the relation between the use of similarity in Memory-Based Learning and the notion of backed-off smoothing in statistical language modeling. We show that the two approaches are closely related, and we argue that feature weighting methods in the Memory-Based paradigm can offer the advantage of automatically specifying a suitable domainspecific hierarchy between most specific and most general conditioning information without the need for a large number of parameters. We report two applications of this approach: PP-attachment and POStagging. . | Memory-Based Learning Using Similarity for Smoothing Jakub Zavrel and Walter Daelemans Computational Linguistics Tilburg University PO Box 90153 5000 LE Tilburg The Netherlands zavrel walter @ Abstract This paper analyses the relation between the use of similarity in Memory-Based Learning and the notion of backed-off smoothing in statistical language modeling. We show that the two approaches are closely related and we argue that feature weighting methods in the Memory-Based paradigm can offer the advantage of automatically specifying a suitable domainspecific hierarchy between most specific and most general conditioning information without the need for a large number of parameters. We report two applications of this approach PP-attachment and POS-tagging. Our method achieves state-of-the-art performance in both domains and allows the easy integration of diverse information sources such as rich lexical representations. 1 Introduction Statistical approaches to disambiguation offer the advantage of making the most likely decision on the basis of available evidence. For this purpose a large number of probabilities has to be estimated from a training corpus. However many possible conditioning events are not present in the training data yielding zero Maximum Likelihood ML estimates. This motivates the need for smoothing methods which reestimate the probabilities of low-count events from more reliable estimates. Inductive generalization from observed to new data lies at the heart of machine-learning approaches to disambiguation. In Memory-Based Learning1 MBL induction is based on the use of similarity Stanfill Waltz 1986 Aha et al. 1991 Cardie 1994 Daelemans 1995 . In this paper we describe how the use of similarity between patterns embodies a solution to the sparse data problem how it 1The Approach is also referred to as Case-based Instance-based or Exemplar-based. relates to backed-off smoothing methods and what advantages it offers when combining diverse and rich

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.