TAILIEUCHUNG - Báo cáo khoa học: " Teaching a Weaker Classifier: Named Entity Recognition on Upper Case Text"

This paper describes how a machinelearning named entity recognizer (NER) on upper case text can be improved by using a mixed case NER and some unlabeled text. The mixed case NER can be used to tag some unlabeled mixed case text, which are then used as additional training material for the upper case NER. We show that this approach reduces the performance gap between the mixed case NER and the upper case NER substantially, by 39% for MUC-6 and 22% for MUC-7 named entity test data. Our method is thus useful in improving the accuracy of NERs on upper. | Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics ACL Philadelphia July 2002 pp. 481-488. Teaching a Weaker Classifier Named Entity Recognition on Upper Case Text Hai Leong Chieu DSO National Laboratories 20 Science Park Drive Singapore 118230 chaileon@ Hwee Tou Ng Department of Computer Science School of Computing National University of Singapore 3 Science Drive 2 Singapore 117543 nght@ Abstract This paper describes how a machinelearning named entity recognizer NER on upper case text can be improved by using a mixed case NER and some unlabeled text. The mixed case NER can be used to tag some unlabeled mixed case text which are then used as additional training material for the upper case NER. We show that this approach reduces the performance gap between the mixed case NER and the upper case NER substantially by 39 for MUC-6 and 22 for MUC-7 named entity test data. Our method is thus useful in improving the accuracy of NERs on upper case text such as transcribed text from automatic speech recognizers where case information is missing. 1 Introduction In this paper we propose using a mixed case named entity recognizer NER that is trained on labeled text to further train an upper case NER. In the Sixth and Seventh Message Understanding Conferences MUC-6 1995 mUc-7 1998 the named entity task consists of labeling named entities with the classes PERSON ORGANIZATION LOCATION DATE TIME MONEY and PERCENT. We conducted experiments on upper case named entity recognition and showed how unlabeled mixed case text can be used to improve the results of an upper case NER on the official MUC-6 and MUC-7 Mixed Case Consuela Washington a longtime House staffer and an expert in securities laws is a leading candidate to be chairwoman of the Securities and Exchange Commission in the Clinton administration. Upper Case CONSUELA WASHINGTON A LONGTIME HOUSE STAFFER AND AN EXPERT IN SECURITIES LAWS IS A LEADING CANDIDATE TO BE .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.