TAILIEUCHUNG - Báo cáo khoa học: "An Endogeneous Corpus-Based Method for Structural Noun Phrase Disambiguation"

In this paper, we describe a method for structural noun phrase disambiguation which mainly relies on the examination of the text corpus under analysis and doesn't need to integrate any domain-dependent lexico- or syntactico-semantic information. This method is implemented in the Terminology Extraction Sotware LEXTER. We first explain why the integration of LEXTER in the LEXTER-K project, which aims at building a tool for knowledge extraction from large technical text corpora, requires improving the quality of the terminolgy extracted by LEXTER. Then we briefly describe the way LEXTER works and show what kind of disambiguation it has to perform. | An Endogeneous Corpus-Based Method for Structural Noun Phrase Disambiguation Didier Bourigault Centre d Analyse et de Mathematiques Sociales EHESS - Paris Sorbonne - CNRS and Electricité de France - Dừectìon des Etudes et Recherches Service Informatique et Mathématiques Appliquées 1 avenue du Général de Gaulle 92141 Clamart Cedex FRANCE Abstract In this paper we describe a method for structural noun phrase disambiguation which mainly relies on the examination of the text corpus under analysis and doesn t need to integrate any domain-dependent lexico- or syntactico-semantic information. This method is implemented in the Terminology Extraction Sotware LEXTER. We first explain why the integration of LEXTER in the LEXTER-K project which aims at building a tool for knowledge extraction from large technical text corpora requires improving the quality of the terminolgy extracted by LEXTER. Then we briefly describe the way LEXTER . works and show what kind of disambiguation it has to perform when parsing maximal-length noun phrases. We introduce a method of disambiguation which relies on a very simple idea whenever LEXTER has to choose among several competing noun sub-groups in order to disambiguate a maximal-length noun phrase it checks each of these sub-groups if it occurs anywhere else in the corpus in a non-ambiguous situation and then it makes a choice. The half-a-million words corpus analysis resulted in an efficient strategy of disambiguation. The average rates are 27 no disambiguation 70 correct disambiguation 3 wrong disambiguation 1 The LEXTER-K project knowledge extraction from large technical text corpora LEXTER is a Terminology Extraction Software Bourigault 1992a 1992b . A corpus of French-language texts on any technical subject is fed in. LEXTER performs a grammatical analysis of this corpus and yields a list of noun phrases which are likely to be terminological units representing the concepts of the subject field. This list together with the corpus it has .

Việt Long 77 6 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "An Endogeneous Corpus-Based Method for Structural Noun Phrase Disambiguation"

6 54 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462336 61

Giới thiệu :Lập trình mã nguồn mở

14 25915 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11335 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10543 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9835 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8885 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8499 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8098 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7709 1788

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7240 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Giáo án mầm non chương trình đổi mới: Gia đình vui nhộn

4 391 3 23-12-2024

Báo cáo nghiên cứu khoa học " KẾT QUẢ NGHIÊN CỨU BƯỚC ĐẦU VỀ THIÊN ĐỊCH CHÂN KHỚP TRÊN CÂY THANH TRÀ Ở THỪA THIÊN HUẾ "

7 275 4 23-12-2024

THE ANTHROPOLOGY OF ONLINE COMMUNITIES BY Samuel M.Wilson and Leighton C. Peterson

19 223 4 23-12-2024

Báo cáo nghiên cứu nông nghiệp " Field control of pest fruit flies in Vietnam "

14 189 4 23-12-2024

Word Games with English 1

65 137 1 23-12-2024

báo cáo khoa học: "Malignant peripheral nerve sheath tumor arising from the greater omentum: Case report"

4 140 1 23-12-2024

Báo cáo nghiên cứu khoa học " NÂNG QUAN HỆ KINH TẾ THƯƠNG MẠI VIỆT NAM - TRUNG QUỐC LÊN TẦM CAO THỜI ĐẠI "

8 170 1 23-12-2024

Báo cáo nghiên cứu khoa học " Đại hội XVI thông qua điều lệ Đảng cộng sản Trung Quốc những sửa đổi bổ sung mới "

4 161 1 23-12-2024

Báo cáo nghiên cứu khoa học " Sự nhất quán phát triển kinh tế thị trường XHCN trong xây dựng xã hội hài hoà của Trung Quốc và đổi mới của Việt Nam "

8 144 1 23-12-2024

Chủ đề 3 : SỰ CÂN BẰNG CỦA VẬT RẮN (4 tiết)

9 206 1 23-12-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8098 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7709 1788

Ebook Chào con ba mẹ đã sẵn sàng

112 4406 1371

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6273 1266

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8885 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3835 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3917 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4700 565

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11335 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4501 490