TAILIEUCHUNG - Báo cáo khoa học: "Compiling Boostexter Rules into a Finite-state Transducer"

A number of NLP tasks have been effectively modeled as classification tasks using a variety of classification techniques. Most of these tasks have been pursued in isolation with the classifier assuming unambiguous input. In order for these techniques to be more broadly applicable, they need to be extended to apply on weighted packed representations of ambiguous input. One approach for achieving this is to represent the classification model as a weighted finite-state transducer (WFST). | Compiling Boostexter Rules into a Finite-state Transducer Srinivas Bangalore AT T Labs-Research 180 Park Avenue Florham Park NJ 07932 Abstract A number of NLP tasks have been effectively modeled as classification tasks using a variety of classification techniques. Most of these tasks have been pursued in isolation with the classifier assuming unambiguous input. In order for these techniques to be more broadly applicable they need to be extended to apply on weighted packed representations of ambiguous input. One approach for achieving this is to represent the classification model as a weighted finite-state transducer WFST . In this paper we present a compilation procedure to convert the rules resulting from an AdaBoost classifier into an WFST. We validate the compilation technique by applying the resulting WFST on a call-routing application. 1 Introduction Many problems in Natural Language Processing NLP can be modeled as classification tasks either at the word or at the sentence level. For example part-of-speech tagging named-entity identification supertagging1 word sense disambiguation are tasks that have been modeled as classification problems at the word level. In addition there are problems that classify the entire sentence or document into one of a set of categories. These problems are loosely characterized as semantic classification and have been used in many practical applications including call routing and text classification. Most of these problems have been addressed in isolation assuming unambiguous one-best input. Typically however in NLP applications these modules are chained together with each module introducing some amount of error. In order to alleviate the errors introduced by a module it is typical for a module to provide multiple weighted solutions ideally as a packed representation that serve as input to the next module. For example a speech recognizer provides a lattice of possible recognition outputs that is to be annotated with part-of-speech

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.