TAILIEUCHUNG - Báo cáo khoa học: "Unsupervised Part-of-Speech Tagging Employing Efficient Graph Clustering"

An unsupervised part-of-speech (POS) tagging system that relies on graph clustering methods is described. Unlike in current state-of-the-art approaches, the kind and number of different tags is generated by the method itself. We compute and merge two partitionings of word graphs: one based on context similarity of high frequency words, another on log-likelihood statistics for words of lower frequencies. Using the resulting word clusters as a lexicon, a Viterbi POS tagger is trained, which is refined by a morphological component. . | Unsupervised Part-of-Speech Tagging Employing Efficient Graph Clustering Chris Biemann University of Leipzig NLP Department Augustusplatz 10 11 04109 Leipzig Germany biem@ Abstract An unsupervised part-of-speech POS tagging system that relies on graph clustering methods is described. Unlike in current state-of-the-art approaches the kind and number of different tags is generated by the method itself. We compute and merge two partitionings of word graphs one based on context similarity of high frequency words another on log-likelihood statistics for words of lower frequencies. Using the resulting word clusters as a lexicon a Viterbi POS tagger is trained which is refined by a morphological component. The approach is evaluated on three different languages by measuring agreement with existing taggers. 1 Introduction Motivation Assigning syntactic categories to words is an important pre-processing step for most NLP applications. Essentially two things are needed to construct a tagger a lexicon that contains tags for words and a mechanism to assign tags to running words in a text. There are words whose tags depend on their use. Further we also need to be able to tag previously unseen words. Lexical resources have to offer the possible tags and our mechanism has to choose the appropriate tag based on the context. Given a sufficient amount of manually tagged text several approaches have demonstrated the ability to learn the instance of a tagging mechanism from manually labelled data and apply it successfully to unseen data. Those high-quality resources are typically unavailable for many languages and their creation is labourintensive. We will describe an alternative needing much less human intervention. In this work steps are undertaken to derive a lexicon of syntactic categories from unstructured text without prior linguistic knowledge. We employ two different techniques one for high-and medium frequency terms one for medium-and low frequency

Bích Diệp 68 6 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462343 61

Giới thiệu :Lập trình mã nguồn mở

14 26146 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11351 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10553 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9844 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8891 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8507 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8101 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7767 1793

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7274 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

báo cáo hóa học:" Increased androgen receptor expression in serous carcinoma of the ovary is associated with an improved survival"

6 156 3 29-12-2024

Chương 10: Các phương pháp tính quá trình quá độ trong mạch điện tuyến tính

57 235 7 29-12-2024

BÀI GIẢNG Biến Đổi Năng Lượng Điện Cơ - TS. Hồ Phạm Huy

137 161 1 29-12-2024

CHƯƠNG 2: RỦI RO THÂM HỤT TÀI KHÓA

28 160 1 29-12-2024

ĐỀ TÀI " ĐÁNH GIÁ HIỆU QUẢ HOẠT ĐỘNG KINH DOANH NGOẠI HỐI CỦA NGÂN HÀNG THƯƠNG MẠI CỔ PHẦN XUẤT NHẬP KHẨU VIỆT NAM "

51 153 3 29-12-2024

Bệnh sán lá gan trên gia súc và cách phòng trị

3 163 1 29-12-2024

English Grammar Tests-Elementary Level's archiveReal Life: Accessories and Clothing (1)

8 122 0 29-12-2024

Tóc highlight cho mùa thu

7 129 0 29-12-2024

Đề thi Tiếng Anh lop 12 (2010-2011) Trần Hưng Đạo Mã đề: 001

19 108 0 29-12-2024

BÁN HÀNG,NGHỀ VÀ NGHIỆP

3 122 0 29-12-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8101 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7767 1793

Ebook Chào con ba mẹ đã sẵn sàng

112 4410 1371

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6305 1268

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8891 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3843 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3921 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4720 565

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11351 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4511 490