TAILIEUCHUNG - Báo cáo khoa học: "A Study on Automatically Extracted Keywords in Text Categorization"

This paper presents a study on if and how automatically extracted keywords can be used to improve text categorization. In summary we show that a higher performance — as measured by micro-averaged F-measure on a standard text categorization collection — is achieved when the full-text representation is combined with the automatically extracted keywords. The combination is obtained by giving higher weights to words in the full-texts that are also extracted as keywords. We also present results for experiments in which the keywords are the only input to the categorizer, either represented as unigrams or intact. . | A Study on Automatically Extracted Keywords in Text Categorization Anette Hulth and Beata B. Megyesi Department of Linguistics and Philology Uppsala University Sweden bea@ Abstract This paper presents a study on if and how automatically extracted keywords can be used to improve text categorization. In summary we show that a higher performance as measured by micro-averaged F-measure on a standard text categorization collection is achieved when the full-text representation is combined with the automatically extracted keywords. The combination is obtained by giving higher weights to words in the full-texts that are also extracted as keywords. We also present results for experiments in which the keywords are the only input to the cat-egorizer either represented as unigrams or intact. Of these two experiments the unigrams have the best performance although neither performs as well as headlines only. 1 Introduction Automatic text categorization is the task of assigning any of a set of predefined categories to a document. The prevailing approach is that of supervised machine learning in which an algorithm is trained on documents with known categories. Before any learning can take place the documents must be represented in a form that is understandable to the learning algorithm. A trained prediction model is subsequently applied to previously unseen documents to assign the categories. In order to perform a text categorization task there are two major decisions to make how to represent the text and what learning algorithm to use to create the prediction model. The decision about the representation is in turn divided into two sub questions what features to select as input and which type of value to assign to these features. In most studies the best performing representation consists of the full length text keeping the tokens in the document separate that is as unigrams. In recent years however a number of experiments have been .

Ngọc Uyên 70 8 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Enhancing management measures on students'self- study activities at Hung Vuong University, Phu Tho, Viet Nam

120 82 0

A systematic mapping study of cloud computing middleware, stacks, tools, delivery networks

21 74 1

A STUDY ON FORMS EXPRESSING FUTURE MEANINGS IN ENGLISH AND VIETNAMESE

51 75 0

Nghiên cứu trường hợp (Case Study) như một chiến lược nghiên cứu - Thành Nhân

1 139 3

Perioperative patient outcomes in the African Surgical Outcomes Study: a 7-day prospective observational cohort study

12 80 0

Lesson study and application experience of lesson study to build capacity for students of literature education

9 36 1

AMERICAN HERO-MYTHS. A STUDY IN THE NATIVE RELIGIONS OF THE WESTERN CONTINENT

196 67 0

US STUDY GUIDE

47 70 0

2 bộ giáo trình English hay English Study 4.1 và English Study Pro 1.0 là 2 giáo trình

4 86 0

Ebook ACCA F7 INT Study text Financial Reporting

569 111 2

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462079 59

Giới thiệu :Lập trình mã nguồn mở

14 23851 75

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11129 537

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10374 458

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9655 106

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8657 1150

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8363 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7947 2250

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6983 260

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6776 1611

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Trading Strategies Profit Making Techniques For Stock_3

23 227 3 01-07-2024

TƯƠNG QUAN GIỮA MÔ HỌC, GIẢI PHẪU VÀ HÌNH ẢNH CỦA CÁC KHỐI U PHẦN PHỤ

3 194 1 01-07-2024

Báo cáo tốt nghiệp: Vận hành và bảo dưỡng trong MPLS

92 169 5 01-07-2024

XỬ TRÍ CHẤN THƯƠNG SỌ NÃO KÍN

1 148 2 01-07-2024

Truyện kiếm hiệp - Duy ngã độc tôn phần 5/7

1 119 0 01-07-2024

Báo cáo nghiên cứu nông nghiệp " Biofertiliser inoculant technology for the growth of rice in Vietnam: Developing technical infrastructure for quality assurance and village production for farmers "

12 113 0 01-07-2024

Chương 10: Các phương pháp tính quá trình quá độ trong mạch điện tuyến tính

57 203 5 01-07-2024

Hướng dẫn chế độ dinh dưỡng cho người bệnh viêm khớp

5 143 0 01-07-2024

The Constituents of Medicinal Plants

185 139 0 01-07-2024

báo cáo hóa học:" A decade of modelling research yields considerable evidence for the importance of concurrency: a response to Sawers and Stillwaggon"

7 120 0 01-07-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7947 2250

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6776 1611

Ebook Chào con ba mẹ đã sẵn sàng

112 4025 1302

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5712 1196

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8657 1150

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3649 667

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3849 601

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4412 546

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11129 537

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4299 483