TAILIEUCHUNG - Báo cáo khoa học: "High Precision Treebanking — Blazing Useful Trees Using POS Information"

In this paper we present a quantitative and qualitative analysis of annotation in the Hinoki treebank of Japanese, and investigate a method of speeding annotation by using part-of-speech tags. The Hinoki treebank is a Redwoods-style treebank of Japanese dictionary definition sentences. 5,000 sentences are annotated by three different annotators and the agreement evaluated. An average agreement of was found using strict agreement, and using labeled precision. Exploiting POS tags allowed the annotators to choose the best parse with fewer decisions. . | High Precision Treebanking Blazing Useful Trees Using POS Information Takaaki Tanaka t Francis Bond Stephan Oepen Sanae Fujitat t takaaki bond fujita @ oe@ t NTT Communication Science Laboratories Nippon Telegraph and Telephone Corporation Universitetet i Oslo and CSLI Stanford Abstract In this paper we present a quantitative and qualitative analysis of annotation in the Hinoki treebank of Japanese and investigate a method of speeding annotation by using part-of-speech tags. The Hinoki treebank is a Redwoods-style treebank of Japanese dictionary definition sentences. 5 000 sentences are annotated by three different annotators and the agreement evaluated. An average agreement of was found using strict agreement and using labeled precision. Exploiting POS tags allowed the annotators to choose the best parse with fewer decisions. 1 Introduction It is important for an annotated corpus that the markup is both correct and in cases where variant analyses could be considered correct consistent. Considerable research in the field of word sense disambiguation has concentrated on showing that the annotation of word senses can be done correctly and consistently with the normal measure being interannotator agreement . Kilgariff and Rosenzweig 2000 . Surprisingly few such studies have been carried out for syntactic annotation with the notable exceptions of Brants et al. 2003 p 82 for the German NeGra Corpus and Civit et al. 2003 for the Spanish Cast3LB corpus. Even such valuable and widely used corpora as the Penn TreeBank have not been verified in this way. We are constructing the Hinoki treebank as part of a larger project in cognitive and computational lin guistics ultimately aimed at natural language understanding Bond et al. 2004 . In order to build the initial syntactic and semantic models we are treebanking the dictionary definition sentences of the most familiar 28 000 words of Japanese and building an ontology from

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.