TAILIEUCHUNG - Báo cáo khoa học: "Similarity-Based Methods For Word Sense Disambiguation"

We compare four similarity-based estimation methods against back-off and maximum-likelihood estimation methods on a pseudo-word sense disambiguation task in which we controlled for both unigram and bigram frequency. The similarity-based methods perform up to 40% better on this particular task. We also conclude that events that occur only once in the training set have major impact on similarity-based estimates. | Similarity-Based Methods For Word Sense Disambiguation Ido Dagan Dept of Mathematics and Computer Science Bar Ilan University Ramat Gan 52900 Israel Lillian Lee Div. of Engineering and Applied Sciences Harvard University Cambridge MA 01238 USA Fernando Pereira AT T Labs - Research 600 Mountain Ave. Murray Hill NJ 07974 USA Abstract We compare four similarity-based estimation methods against back-off and maximum-likelihood estimation methods on a pseudo-word sense disambiguation task in which we controlled for both unigram and bigram frequency. The similarity-based methods perform up to 40 better on this particular task. We also conclude that events that occur only once in the training set have major impact on similarity-based estimates. 1 Introduction The problem of data sparseness affects all statistical methods for natural language processing. Even large training sets tend to misrepresent low-probability events since rare events may not appear in the training corpus at all. We concentrate here on the problem of estimating the probability of unseen word pairs that is pairs that do not occur in the training set. Katz s back-off scheme Katz 1987 widely used in bigram language modeling estimates the probability of an unseen bigram by utilizing unigram estimates. This has the undesirable result of assigning unseen bigrams the same probability if they are made up of unigrams of the same frequency. Class-based methods Brown et al. 1992 Pereira Tishby and Lee 1993 Resnik 1992 cluster words into classes of similar words so that one can base the estimate of a word pair s probability on the averaged cooccurrence probability of the classes to which the two words belong. However a word is therefore modeled by the average behavior of many words which may cause the given word s idiosyncrasies to be ignored. For instance the word red might well act like a generic color word in most cases but it has distinctive .

Ðức Trí 38 8 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462302 61

Giới thiệu :Lập trình mã nguồn mở

14 24979 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11294 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10514 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9797 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8878 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8468 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8092 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7483 1764

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7196 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

B2B Content Marketing: 2012 Benchmarks, Budgets & Trends

17 214 3 30-11-2024

Data Structures and Algorithms - Chapter 8: Heaps

41 173 5 30-11-2024

Giáo trình phân tích phương trình vi phân viết dưới dạng thuật toán đặc tính của hệ thống p1

5 149 1 30-11-2024

Báo cáo nghiên cứu nông nghiệp " Biofertiliser inoculant technology for the growth of rice in Vietnam: Developing technical infrastructure for quality assurance and village production for farmers "

12 133 2 30-11-2024

Báo cáo nghiên cứu khoa học " HÃY LÀM CHO HUẾ XANH HƠN VÀ ĐẸP HƠN "

6 170 3 30-11-2024

Hướng dẫn chế độ dinh dưỡng cho người bệnh viêm khớp

5 160 2 30-11-2024

CHƯƠNG 2: RỦI RO THÂM HỤT TÀI KHÓA

28 152 1 30-11-2024

Báo cáo " Bàn về hành vi pháp luật và hành vi đạo đức "

11 171 2 30-11-2024

Báo cáo nghiên cứu khoa học " NÂNG QUAN HỆ KINH TẾ THƯƠNG MẠI VIỆT NAM - TRUNG QUỐC LÊN TẦM CAO THỜI ĐẠI "

8 160 1 30-11-2024

Báo cáo nghiên cứu khoa học " Đại hội XVI thông qua điều lệ Đảng cộng sản Trung Quốc những sửa đổi bổ sung mới "

4 156 1 30-11-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8092 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7483 1764

Ebook Chào con ba mẹ đã sẵn sàng

112 4369 1369

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6162 1259

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8878 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3797 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3911 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4623 562

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11294 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4460 490