TAILIEUCHUNG - Báo cáo khoa học: "Measures of Distributional Similarity"

We study distributional similarity measures for the purpose of improving probability estimation for unseen cooccurrences. Our contributions are three-fold: an empirical comparison of a broad range of measures; a classification of similarity functions based on the information that they incorporate; and the introduction of a novel function that is superior at evaluating potential proxy distributions. | Measures of Distributional Similarity Lillian Lee Department of Computer Science Cornell University Ithaca NY 14853-7501 Abstract We study distributional similarity measures for the purpose of improving probability estimation for unseen cooccurrences. Our contributions are three-fold an empirical comparison of a broad range of measures a classification of similarity functions based on the information that they incorporate and the introduction of a novel function that is superior at evaluating potential proxy distributions. 1 Introduction An inherent problem for statistical methods in natural language processing is that of sparse data the inaccurate representation in any training corpus of the probability of low frequency events. In particular reasonable events that happen to not occur in the training set may mistakenly be assigned a probability of zero. These unseen events generally make up a substantial portion of novel data for example Essen and Steinbiss 1992 report that 12 of the test-set bigrams in a 75 -25 split of one million words did not occur in the training partition. We consider here the question of how to estimate the conditional cooccurrence probability p v n of an unseen word pair n v drawn from some finite set N X V. Two state-of-the-art technologies are Katz s 1987 backoff method and Jelinek and Mercer s 1980 interpolation method. Both use P v to estimate P v n when n v is unseen essentially ignoring the identity of n. An alternative approach is distance-weighted averaging which arrives at an estimate for unseen cooccurrences by combining estimates for cooccurrences involving similar words 1 Az I X _ me5 n sim n Wl P v m P y n -------------- ----------- 1 EmeS n sim n m v where S n is a set of candidate similar words and sim n m is a function of the similarity between n and m. We focus on distributional rather than semantic similarity . Resnik 1995 because the goal of distance-weighted averaging is to smooth probability .

Ngọc Ðiệp 94 8 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Emergency department operational metrics, measures and definitions: Results of the second performance measures and benchmarking summit

8 79 0

Lectures Applied statistics for business: Chapter 3 - ThS. Nguyễn Tiến Dũng

45 93 0

The Theory of Measures and Integration

168 13 1

Báo cáo khoa học: "Measures of Distributional Similarity"

8 74 1

Lecture Basic statistics for business and economics - Chapter 3: Describing data: Numerical measures

15 89 0

Research on reproductive characteristics and multiplication measures of vetiver grass in Thua Thien Hue

8 69 0

Based on HDFS safety measures construction of secret storage cloud

6 67 0

Lecture Fundamentals of cost accounting - Chapter 18: Nonfinancial and multiple measures of performance

17 54 0

Anlytic measures for adaptability of wheat genotypes for northern hills zone of country by mixed model approach

18 74 0

Scientists’ preference on suggested action measures and major areas of impact for changing weather patterns in Cuttack district of Odisha, India

10 107 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462347 61

Giới thiệu :Lập trình mã nguồn mở

14 26466 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11368 543

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10557 468

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9850 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8897 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8512 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8107 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7833 1803

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7285 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

THE ANTHROPOLOGY OF ONLINE COMMUNITIES BY Samuel M.Wilson and Leighton C. Peterson

19 231 4 04-01-2025

Báo cáo nghiên cứu khoa học " HÃY LÀM CHO HUẾ XANH HƠN VÀ ĐẸP HƠN "

6 184 3 04-01-2025

Bảng màu theo chữ cái – V

11 174 2 04-01-2025

CHƯƠNG 2: RỦI RO THÂM HỤT TÀI KHÓA

28 164 1 04-01-2025

Báo cáo y học: "The Factors Influencing Depression Endpoints Research (FINDER) study: final results of Italian patients with depressio"

9 154 1 04-01-2025

Báo cáo " Thẩm quyền quản lí nhà nước đối với hoạt động quảng cáo thực trạng và hướng hoàn thiện "

7 211 7 04-01-2025

Báo cáo nghiên cứu khoa học " Vai trò chính quyền địa phương trong phát triển kinh tế : khu chuyên doanh gốm sứ ( Trung Quốc ) và Bát Tràng ( Việt Nam )("

11 217 1 04-01-2025

báo cáo khoa học: "Malignant peripheral nerve sheath tumor arising from the greater omentum: Case report"

4 146 1 04-01-2025

IT Audit: EMC’s Journey to the Private Cloud

13 161 1 04-01-2025

CUỘC KHÁNG CHIẾN CHỐNG THỰC DÂN PHÁP KẾT THÚC (1953 - 1954)_5

11 153 1 04-01-2025

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8107 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7833 1803

Ebook Chào con ba mẹ đã sẵn sàng

112 4424 1376

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6336 1275

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8897 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3854 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3926 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4753 567

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11368 543

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4527 490