TAILIEUCHUNG - Báo cáo khoa học: "A Flexible POS Tagger Using an Automatically Acquired Language Model"

We present an algorithm that automatically learns context constraints using statistical decision trees. We then use the acquired constraints in a flexible POS tagger. The tagger is able to use information of any degree: n-grams, automatically learned context constraints, linguistically motivated manually written constraints, etc. The sources and kinds of constraints are unrestricted, and the language model can be easily extended, improving the results. The tagger has been tested and evaluated on the WSJ corpus. . | A Flexible POS Tagger Using an Automatically Acquired Language Model Lluis Marquez LSI - UPC c Jordi Girona 1-3 08034 Barcelona. Catalonia lluism@ Lluís Padró LSI - UPC c Jordi Girona 1-3 08034 Barcelona. Catalonia padro@ Abstract We present an algorithm that automatically learns context constraints using statistical decision trees. We then use the acquired constraints in a flexible POS tagger. The tagger is able to use information of any degree n-grams automatically learned context constraints linguistically motivated manually written constraints etc. The sources and kinds of constraints are unrestricted and the language model can be easily extended improving the results. The tagger has been tested and evaluated on the WSJ corpus. 1 Introduction In NLP it is necessary to model the language in a representation suitable for the task to be performed. The language models more commonly used are based on two main approaches first the linguistic approach in which the model is written by a linguist generally in the form of rules or constraints Vouti-lainen and Jarvinen 1995 . Second the automatic approach in which the model is automatically obtained from corpora either raw or annotated 1 and consists of n-grams Garside et al. 1987 Cutting et al. 1992 rules Hindle 1989 or neural nets Schmid 1994 . In the automatic approach we can distinguish two main trends The low-level data . trend collects statistics from the training corpora in the form of n-grams probabilities weights etc. The high level data trend acquires more sophisticated information such as context rules constraints or decision trees Daelemans et al. 1996 Marquez and Rodríguez 1995 Samuelsson et al. 1996 . The acquisition methods range from supervised-inductive-learning-from-example algorithms Quinlan 1986 This research has been partially funded by the Spanish Research Department CICYT and inscribed as TIC96-1243-C03-02 When the model is obtained from annotated corpora we talk about supervised

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.