TAILIEUCHUNG - Báo cáo khoa học: "A Combination of Active Learning and Semi-supervised Learning Starting with Positive and Unlabeled Examples for Word Sense Disambiguation: An Empirical Study on Japanese Web Search Query"

This paper proposes to solve the bottleneck of finding training data for word sense disambiguation (WSD) in the domain of web queries, where a complete set of ambiguous word senses are unknown. In this paper, we present a combination of active learning and semi-supervised learning method to treat the case when positive examples, which have an expected word sense in web search result, are only given. The novelty of our approach is to use “pseudo negative examples” with reliable confidence score estimated by a classifier trained with positive and unlabeled examples. We show experimentally that our proposed method achieves. | A Combination of Active Learning and Semi-supervised Learning Starting with Positive and Unlabeled Examples for Word Sense Disambiguation An Empirical Study on Japanese Web Search Query Makoto Imamura and Yasuhiro Takayama Information Technology R D Center Mitsubishi Electric Corporation 5-1-1 Ofuna Kamakura Kanagawa Japan hiro@ea . Abstract This paper proposes to solve the bottleneck of finding training data for word sense disambiguation WSD in the domain of web queries where a complete set of ambiguous word senses are unknown. In this paper we present a combination of active learning and semi-supervised learning method to treat the case when positive examples which have an expected word sense in web search result are only given. The novelty of our approach is to use pseudo negative examples with reliable confidence score estimated by a classifier trained with positive and unlabeled examples. We show experimentally that our proposed method achieves close enough WSD accuracy to the method with the manually prepared negative examples in several Japanese Web search data. 1 Introduction In Web mining for sentiment or reputation analysis it is important for reliable analysis to extract large amount of texts about certain products shops or persons with high accuracy. When retrieving texts from Web archive we often suffer from word sense ambiguity and WSD system is indispensable. For instance when we try to analyze reputation of Loft a name of variety store chain in Japan we found that simple text search retrieved many unrelated texts which contain Loft with different senses such as an attic room an angle of golf club face a movie title a name of a club with live music and so on. The words in Web search queries are often proper nouns. Then it is not trivial to discriminate these Nobuhiro Kaji Masashi Toyoda and Masaru Kitsuregawa Institute of Industrial Science The University of Tokyo 4-6-1 Komaba Meguro-ku Tokyo .

Thục Ðoan 86 4 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Evaluation of combination of different methods for determination of activity of radioactive waste in sealed drum

7 68 0

Comparison of the efficacy of different therapeutic regimen in the management of glaucoma in dogs

8 70 0

Plant growth and yield response of tillage in wheat crop with rotavator and subsoiler combination and no tillage

4 63 0

Effect of different levels of phosphorus and biofertilizer combination on growth and yield of kharif greengram (Vigna radiata L.)

5 24 1

Doctoral thesis: Anti-human colon cancer efficacy of measlse and mumps viral vaccine combination in experimental study

26 27 1

Trastuzumab and fulvestrant combination therapy for women with advanced breast cancer positive for hormone receptor and human epidermal growth factor receptor 2: A retrospective single-center study

7 43 3

Combination of lutetium-177 labelled antiL1CAM antibody chCE7 with the clinically relevant protein kinase inhibitor MK1775: A novel combination against human ovarian carcinoma

14 35 1

Standardization of propagation technique and media combination in acid lime (Citrus aurantifolia) var. PKM 1

11 28 1

Business Combination - Kết hợp kinh doanh

1 73 1

Báo cáo " Isolated Handwritten Vietnamese Character Recognition with Feature Extraction and Classifier Combination "

17 47 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462336 61

Giới thiệu :Lập trình mã nguồn mở

14 25915 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11335 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10543 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9835 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8885 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8499 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8098 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7709 1788

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7240 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Báo cáo nghiên cứu khoa học " HÃY LÀM CHO HUẾ XANH HƠN VÀ ĐẸP HƠN "

6 180 3 23-12-2024

Chương 10: Các phương pháp tính quá trình quá độ trong mạch điện tuyến tính

57 232 7 23-12-2024

CHƯƠNG 2: RỦI RO THÂM HỤT TÀI KHÓA

28 158 1 23-12-2024

ĐỀ TÀI " ĐÁNH GIÁ HIỆU QUẢ HOẠT ĐỘNG KINH DOANH NGOẠI HỐI CỦA NGÂN HÀNG THƯƠNG MẠI CỔ PHẦN XUẤT NHẬP KHẨU VIỆT NAM "

51 149 3 23-12-2024

Word Games with English 1

65 137 1 23-12-2024

Báo cáo nghiên cứu khoa học " NÂNG QUAN HỆ KINH TẾ THƯƠNG MẠI VIỆT NAM - TRUNG QUỐC LÊN TẦM CAO THỜI ĐẠI "

8 170 1 23-12-2024

Chủ đề 3 : SỰ CÂN BẰNG CỦA VẬT RẮN (4 tiết)

9 206 1 23-12-2024

Business English Lesson – Advanced Level's archiveFinance (1)

8 113 0 23-12-2024

Giáo trình môn cầu đường

26 134 2 23-12-2024

CÔNG NGHỆ MÔI TRƯỜNG - CHƯƠNG 5 CƠ SỞ QUÁ TRÌNH XỬ LÝ SINH HỌC

1 141 0 23-12-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8098 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7709 1788

Ebook Chào con ba mẹ đã sẵn sàng

112 4406 1371

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6273 1266

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8885 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3835 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3917 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4700 565

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11335 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4501 490