TAILIEUCHUNG - Báo cáo khoa học: "Analysis of Selective Strategies to Build a Dependency-Analyzed Corpus"

This paper discusses sampling strategies for building a dependency-analyzed corpus and analyzes them with different kinds of corpora. We used the Kyoto Text Corpus, a dependency-analyzed corpus of newspaper articles, and prepared the IPAL corpus, a dependency-analyzed corpus of example sentences in dictionaries, as a new and different kind of corpus. The experimental results revealed that the length of the test set controlled the accuracy and that the longest-first strategy was good for an expanding corpus, but this was not the case when constructing a corpus from scratch. . | Analysis of Selective Strategies to Build a Dependency-Analyzed Corpus Kiyonori Ohtake National Institute of Information and Communications Technology NICT ATR Spoken Language Communication Research Labs. 2-2-2 Hikaridai Keihanna Science City Kyoto 619-0288 Japan at Abstract This paper discusses sampling strategies for building a dependency-analyzed corpus and analyzes them with different kinds of corpora. We used the Kyoto Text Corpus a dependency-analyzed corpus of newspaper articles and prepared the IPAL corpus a dependency-analyzed corpus of example sentences in dictionaries as a new and different kind of corpus. The experimental results revealed that the length of the test set controlled the accuracy and that the longest-first strategy was good for an expanding corpus but this was not the case when constructing a corpus from scratch. 1 Introduction Dependency-structure analysis plays a very important role in natural language processing NLP . Thus so far much research has been done on this subject with many analyzers being developed such as rule-based analyzers and corpus-based analyzers that use machine-learning techniques. However the maximum accuracy achieved by state-of-the art analyzers is almost 90 for newspaper articles it seems very difficult to exceed this figure of 90 . To improve our analyzers we have to write more rules for rule-based analyzers or prepare more corpora for corpus-based analyzers. If we take a machine-learning approach it is important to consider what features are used. However there are several machine-learning techniques such as support vector machines SVMs with a kernel function that have strong generalization ability and are very robust for choosing the right features. If we use such machine-learning techniques we will be free from choosing a feature set because it will be possible to use all possible features with little or no decline in performance. Actually Sasano tried to expand the feature set for a

TÀI LIỆU LIÊN QUAN
TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.