TAILIEUCHUNG - Báo cáo khoa học: "Word Alignment and Cross-Lingual Resource Acquisition"

Annotated corpora are valuable resources for developing Natural Language Processing applications. This work focuses on acquiring annotated data for multilingual processing applications. We present an annotation environment that supports a web-based user-interface for acquiring word alignments between English and Chinese as well as a visualization tool for researchers to explore the annotated data. | Word Alignment and Cross-Lingual Resource Acquisition Carol Nichols and Rebecca Hwa Department of Computer Science University of Pittsburgh Pittsburgh PA 15260 cln23 hwa @ Abstract Annotated corpora are valuable resources for developing Natural Language Processing applications. This work focuses on acquiring annotated data for multilingual processing applications. We present an annotation environment that supports a web-based user-interface for acquiring word alignments between English and Chinese as well as a visualization tool for researchers to explore the annotated data. 1 Introduction The performance of many Natural Language Processing NLP applications can be improved through supervised machine learning techniques that train systems with annotated training examples. For example a part-of-speech POS tagger might be induced from words that have been annotated with the correct POS tags. A limitation to the supervised approach is that the annotation is typically performed manually. This poses as a challenge in three ways. First researchers must develop a comprehensive annotation guideline for the annotators to follow. Guideline development is difficult because researchers must be specific enough so that different annotators work will be comparable but also general enough to allow the annotators to make their own linguistic judgments. Reported experiences of previous annotation projects suggest that guideline development is both an art and a science and is itself This work has been supported in part by CRAW Distributed Mentor Program. We thank Karina Iva-netich David Chiang and the NLP group at Pitt for helpful feedbacks on the user interfaces Wanwan Zhang and Ying-Ju Suen for testing the system and the anonymous reviewers for their comments on the paper. a time-consuming process Litman and Pan 2002 Marcus et al. 1993 Xia et al. 2000 Wiebe 2002 . Second it is common for the annotators to make mistakes so some form of consistency check is necessary. Third

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.