TAILIEUCHUNG - Báo cáo khoa học: "Automatic Retrieval and Clustering of Similar Words"

Bootstrapping semantics from text is one of the greatest challenges in natural language learning. We first define a word similarity measure based on the distributional pattern of words. The similarity measure allows us to construct a thesaurus using a parsed corpus. We then present a new evaluation methodology for the automatically constructed thesaurus. The evaluation results show that the thesaurns is significantly closer to WordNet than Roget Thesaurus is. | Automatic Retrieval and Clustering of Similar Words Dekang Lin Department of Computer Science University of Manitoba Winnipeg Manitoba Canada R3T 2N2 lindek@ Abstract Bootstrapping semantics from text is one of the greatest challenges in natural language learning. We first define a word similarity measure based on the distributional pattern of words. The similarity measure allows US to construct a thesaurus using a parsed corpus. We then present a new evaluation methodology for the automatically constructed thesaurus. The evaluation results show that the thesaurus is significantly closer to WordNet than Roget Thesaurus is. 1 Introduction The meaning of an unknown word can often be inferred from its context. Consider the following slightly modified example in Nida 1975 1 A bottle of tezguino is on the table. Everyone likes tezguino. Tezguino makes you drunk. We make tezguino out of com. The contexts in which the word tezguino is used suggest that tezguino may be a kind of alcoholic beverage made from com mash. Bootstrapping semantics from text is one of the greatest challenges in natural language learning. It has been argued that similarity plays an important role in word acquisition Gentner 1982 . Identifying similar words is an initial step in learning the definition of a word. This paper presents a method for making this first step. For example given a corpus that includes the sentences in 1 our goal is to be able to infer that tezguino is similar to beer wine vodka etc. In addition to the long-term goal of bootstrapping semantics from text automatic identification of similar words has many immediate applications. The most obvious one is thesaurus construction. An automatically created thesaurus offers many advantages over manually constructed thesauri. Firstly the terms can be corpus- or genre-specific. Manually constructed general-purpose dictionaries and thesauri include many usages that are very infrequent in a particular corpus or genre

Nguyệt Anh 80 7 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Automatic Retrieval and Clustering of Similar Words"

7 68 0

Automated storage and retrieval system - A new solution for storage and warehouse management in Vietnam

9 40 1

CryGetter: A tool to automate retrieval and analysis of Cry protein data

14 35 1

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 461844 55

Giới thiệu :Lập trình mã nguồn mở

14 22508 57

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10861 529

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10024 445

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9488 104

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8241 1124

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8199 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7859 2219

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6639 253

Vật lý hạt cơ bản (1)

29 5753 85

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Động cơ đốt trong và máy kéo công nghiêp tập 1 part 7

23 256 0 19-04-2024

CẤU TẠO HẠT NHÂN NGUYÊN TỬ-ĐỘ HỤT KHỐI-NĂNG LƯỢNG LIÊN KẾT-LK RIÊNG

12 262 0 19-04-2024

extremetech Hacking BlackBerry phần 9

31 239 0 19-04-2024

beginning Ubuntu Linux phần 1

34 211 1 19-04-2024

Trading Strategies Profit Making Techniques For Stock_3

23 181 0 19-04-2024

Trading Strategies Profit Making Techniques For Stock_8

23 171 0 19-04-2024

Báo cáo nghiên cứu khoa học " KẾT QUẢ NGHIÊN CỨU BƯỚC ĐẦU VỀ THIÊN ĐỊCH CHÂN KHỚP TRÊN CÂY THANH TRÀ Ở THỪA THIÊN HUẾ "

7 173 0 19-04-2024

MySQL Basics for Visual Learners PHẦN 9

15 183 0 19-04-2024

MÔN HỌC VẬT LIỆU VÀ CÔNG NGHỆ KIM LOẠI - PHẦN I: KIM LOẠI HỌC

32 175 2 19-04-2024

Lịch sử Đội TNTP Hồ Chí Minh - CHƯƠNG III VÂNG LỜI BÁC DẠY, LÀM NGHÌN VIỆC TỐT, CHỐNG MỸ, CỨU NƯỚC, THIẾU NIÊN SĂN SÀNG

45 136 0 19-04-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7859 2219

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 5589 1325

Ebook Chào con ba mẹ đã sẵn sàng

112 3749 1228

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8241 1124

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5246 1124

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3471 641

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10861 529

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3668 524

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4022 513

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4093 478