TAILIEUCHUNG - Báo cáo khoa học: "Serial Combination of Rules and Statistics: A Case Study in Czech Tagging"

A hybrid system is described which combines the strength of manual rulewriting and statistical learning, obtaining results superior to both methods if applied separately. The combination of a rule-based system and a statistical one is not parallel but serial: the rule-based system performing partial disambiguation with recall close to 100% is applied first, and a trigram HMM tagger runs on its results. An experiment in Czech tagging has been performed with encouraging results. | Serial Combination of Rules and Statistics A Case Study in Czech Tagging Jan Hajic Pavel Krbec Pavel Kveton Karel Oliva Vladimir Petkevic IFAL ICNC Computational ITCL MFFUK FF UK Linguistics FF UK Prague Prague Univ. of Saarland Prague Czechia Czechia Germany Czechia hajic krbec @ oliva@ Abstract A hybrid system is described which combines the strength of manual rulewriting and statistical learning obtaining results superior to both methods if applied separately. The combination of a rule-based system and a statistical one is not parallel but serial the rule-based system performing partial disambiguation with recall close to 100 is applied first and a trigram HMM tagger runs on its results. An experiment in Czech tagging has been performed with encouraging results. 1 Tagging of Inflective Languages Inflective languages pose a specific problem in tagging due to two phenomena highly inflective nature causing sparse data problem in any statistically-based system and free word order causing fixed-context systems such as n-gram Hidden Markov Models HMMs to be even less adequate than for English . The average tagset contains about 1 000 - 2 000 distinct tags the size of the set of possible and plausible tags can reach several thousands. Apart from agglutinative languages such as Turkish Finnish and Hungarian see . Hakkani-Tur et al. 2000 and Basque Ezeiza et al. 1998 which pose quite different and in the end less severe problems there have been attempts at solving this problem for some of the highly inflectional European languages such as Daelemans et al. 1996 Erjavec et al. 1999 Slovenian Hajic and Hladka 1997 Hajic and Hladka 1998 Czech and Hajic 2000 five Central and Eastern European languages but so far no system has reached - in the absolute terms - a performance comparable to English tagging such as Ratnaparkhi 1996 which stands around or above 97 . For example Hajic and .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.