TAILIEUCHUNG - Báo cáo khoa học: "Minimized models and grammar-informed initialization for supertagging with highly ambiguous lexicons"

Department of Linguistics The University of Texas at Austin Austin, Texas 78712 jbaldrid@ or trigram Hidden Markov Model (HMM). Ravi and Knight (2009) achieved the best results thus far ( word token accuracy) via a Minimum Description Length approach using an integer program (IP) that ﬁnds a minimal bigram grammar that obeys the tag dictionary constraints and covers the observed data. | Minimized models and grammar-informed initialization for supertagging with highly ambiguous lexicons Sujith Ravi1 Jason Baldridge2 Kevin Knight1 University of Southern California Information Sciences Institute Marina del Rey California 90292 sravi knight @ Abstract We combine two complementary ideas for learning supertaggers from highly ambiguous lexicons grammar-informed tag transitions and models minimized via integer programming. Each strategy on its own greatly improves performance over basic expectation-maximization training with a bitag Hidden Markov Model which we show on the CCGbank and CCG-TUT corpora. The strategies provide further error reductions when combined. We describe a new two-stage integer programming strategy that efficiently deals with the high degree of ambiguity on these datasets while obtaining the full effect of model minimization. 1 Introduction Creating accurate part-of-speech POS taggers using a tag dictionary and unlabeled data is an interesting task with practical applications. It has been explored at length in the literature since Merialdo 1994 though the task setting as usually defined in such experiments is somewhat artificial since the tag dictionaries are derived from tagged corpora. Nonetheless the methods proposed apply to realistic scenarios in which one has an electronic part-of-speech tag dictionary or a hand-crafted grammar with limited coverage. Most work has focused on POS-tagging for English using the Penn Treebank Marcus et al. 1993 such as Banko and Moore 2004 Goldwater and Griffiths 2007 Toutanova and Johnson 2008 Goldberg et al. 2008 Ravi and Knight 2009 . This generally involves working with the standard set of 45 POS-tags employed in the Penn Treebank. The most ambiguous word has 7 different POS tags associated with it. Most methods have employed some variant of Expectation Maximization EM to learn parameters for a bigram 2Department of Linguistics The University of Texas at Austin Austin Texas 78712 .

Phương Trâm 67 9 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Minimized models and grammar-informed initialization for supertagging with highly ambiguous lexicons"

9 59 0

Báo cáo khoa học: "Minimized Models for Unsupervised Part-of-Speech Tagging"

9 47 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 461856 55

Giới thiệu :Lập trình mã nguồn mở

14 22583 57

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10880 529

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10043 445

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9510 104

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8267 1124

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8215 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7862 2220

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6664 253

Vật lý hạt cơ bản (1)

29 5764 85

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Báo cáo khoa học: Loss of kinase activity in Mycobacterium tuberculosis multidomain protein Rv1364c

14 234 0 23-04-2024

Oreilly learning the vi Editor phần 4

19 228 0 23-04-2024

Bibliography on Medieval Women, Gender, and Medicine 1980-2009

82 207 0 23-04-2024

beginning Ubuntu Linux phần 1

34 212 1 23-04-2024

Trading Strategies Profit Making Techniques For Stock_3

23 183 0 23-04-2024

MySQL Basics for Visual Learners PHẦN 9

15 183 0 23-04-2024

Đề tài: Tìm hiểu một số yêu cầu đặt ra với một phòng thu âm, để đảm bảo chất lượng âm thanh trong sản phẩm đa phương tiện

8 159 1 23-04-2024

QUẢN LÝ CHẤT LƯỢNG KHÔNG KHÍ

75 136 0 23-04-2024

GIÁO TRÌNH MÁY ĐIỆN KHÍ CỤ ĐIỆN - PHẦN I MÁY ĐIỆN - CHƯƠNG 1

46 130 2 23-04-2024

Christmas Meditations on the Twelve Holy Days

173 103 0 23-04-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7862 2220

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 5667 1347

Ebook Chào con ba mẹ đã sẵn sàng

112 3757 1230

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5295 1134

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8267 1124

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3480 641

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10880 529

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3677 525

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4038 514

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4118 480