TAILIEUCHUNG - Báo cáo khoa học: "Extracting and Classifying Urdu Multiword Expressions"

This paper describes a method for automatically extracting and classifying multiword expressions (MWEs) for Urdu on the basis of a relatively small unannotated corpus (around million tokens). The MWEs are extracted by an unsupervised method and classiﬁed into two distinct classes, namely locations and person names. The classiﬁcation is based on simple heuristics that take the co-occurrence of MWEs with distinct postpositions into account. | Extracting and Classifying Urdu Multiword Expressions Annette Hautli Department of Linguistics University of Konstanz Germany Sebastian Sulger Department of Linguistics University of Konstanz Germany Abstract This paper describes a method for automatically extracting and classifying multiword expressions MWEs for Urdu on the basis of a relatively small unannotated corpus around million tokens . The MWEs are extracted by an unsupervised method and classified into two distinct classes namely locations and person names. The classification is based on simple heuristics that take the co-occurrence of MWEs with distinct postpositions into account. The resulting classes are evaluated against a hand-annotated gold standard and achieve an f-score of and for locations and persons respectively. A target application is the Urdu ParGram grammar where MWEs are needed to generate a more precise syntactic and semantic analysis. 1 Introduction Multiword expressions MWEs are expressions which can be semantically and syntactically idiosyncratic in nature acting as a single unit their meaning is not always predictable from their components. Their identification is therefore an important task for any Natural Language Processing NLP application that goes beyond the analysis of pure surface structure in particular for languages with few other NLP tools available. There is a vast amount of literature on extracting and classifying MWEs automatically many approaches rely on already available resources that aid during the acquisition process. In the case of the Indo-Aryan language Urdu a lack of linguistic re 24 sources such as annotated corpora or lexical knowledge bases impedes the task of detecting and classifying MWEs. Nevertheless statistical measures and language-specific syntactic information can be employed to extract and classify MWEs. Therefore the method described in this paper can partly overcome the .

Phượng Vy 84 6 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Extracting and modeling durations for habits and events from Twitter"

5 38 0

Báo cáo khoa học: "Extracting Narrative Timelines as Temporal Dependency Structures"

10 57 0

Báo cáo khoa học: "Extracting Social Networks from Literary Fiction"

10 43 0

Báo cáo khoa học: "Extracting Paraphrases from Deﬁnition Sentences on the Web"

11 65 0

Báo cáo khoa học: "Extracting Comparative Entities and Predicates from Texts Using Comparative Type Classification"

9 54 0

Báo cáo khoa học: "Hierarchical Sequential Learning for Extracting Opinions and their Attributes"

6 65 0

Báo cáo khoa học: "Extracting Sequences from the Web"

5 51 0

Báo cáo khoa học: "Extracting Opinion Expressions and Their Polarities – Exploration of Pipelines and Joint Models"

6 65 0

Báo cáo khoa học: "Extracting and Classifying Urdu Multiword Expressions"

6 65 0

Báo cáo khoa học: "A Latent Topic Extracting Method based on Events in a Document and its Application"

6 67 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462302 61

Giới thiệu :Lập trình mã nguồn mở

14 24977 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11294 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10514 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9797 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8878 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8468 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8092 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7481 1764

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7196 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

THE ANTHROPOLOGY OF ONLINE COMMUNITIES BY Samuel M.Wilson and Leighton C. Peterson

19 212 4 30-11-2024

báo cáo hóa học:" Increased androgen receptor expression in serous carcinoma of the ovary is associated with an improved survival"

6 151 3 30-11-2024

Bảng màu theo chữ cái – V

11 155 2 30-11-2024

Color Atlas of Ophthamology

165 134 2 30-11-2024

ETHICAL CODE HANDBOOK: Demonstrate your commitment to high standards

7 142 1 30-11-2024

ĐỀ TÀI " ĐÁNH GIÁ HIỆU QUẢ HOẠT ĐỘNG KINH DOANH NGOẠI HỐI CỦA NGÂN HÀNG THƯƠNG MẠI CỔ PHẦN XUẤT NHẬP KHẨU VIỆT NAM "

51 146 3 30-11-2024

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining

101 135 1 30-11-2024

Lịch sử Trung Quốc 5000 năm tập 3 part 2

54 142 1 30-11-2024

Báo cáo lâm nghiệp: "Assessment of the effects of below-zero temperatures on photosynthesis and chlorophyll a fluorescence in leaf discs of Eucalyptus globulu"

4 132 0 30-11-2024

Phạm trù Chủ nghĩa cá nhân của tư tưởng phương Tây trong sự lý giải của Phan Khôi _1

9 120 0 30-11-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8092 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7481 1764

Ebook Chào con ba mẹ đã sẵn sàng

112 4369 1369

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6162 1259

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8878 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3797 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3911 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4623 562

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11294 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4460 490