TAILIEUCHUNG - Báo cáo khoa học: "Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models"

We consider a new subproblem of unsupervised parsing from raw text, unsupervised partial parsing—the unsupervised version of text chunking. We show that addressing this task directly, using probabilistic ﬁnite-state methods, produces better results than relying on the local predictions of a current best unsupervised parser, Seginer’s (2007) CCL. These ﬁnite-state models are combined in a cascade to produce more general (full-sentence) constituent structures; doing so outperforms CCL by a wide margin in unlabeled PARSEVAL scores for English, German and Chinese. . | Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models Elias Ponvert Jason Baldridge and Katrin Erk Department of Linguistics The University of Texas at Austin Austin TX 78712 ponvert jbaldrid @ Abstract We consider a new subproblem of unsupervised parsing from raw text unsupervised partial parsing the unsupervised version of text chunking. We show that addressing this task directly using probabilistic finite-state methods produces better results than relying on the local predictions of a current best unsupervised parser Seginer s 2007 CCL. These finite-state models are combined in a cascade to produce more general full-sentence constituent structures doing so outperforms CCL by a wide margin in unlabeled PARSEVAL scores for English German and Chinese. Finally we address the use of phrasal punctuation as a heuristic indicator of phrasal boundaries both in our system and in CCL. 1 Introduction Unsupervised grammar induction has been an active area of research in computational linguistics for over twenty years Lari and Young 1990 Pereira and Schabes 1992 Charniak 1993 . Recent work Headden III et al. 2009 Cohen and Smith 2009 Hanig 2010 Spitkovsky et al. 2010 has largely built on the dependency model with valence of Klein and Manning 2004 and is characterized by its reliance on gold-standard part-of-speech POS annotations the models are trained on and evaluated using sequences of POS tags rather than raw tokens. This is also true for models which are not successors of Klein and Manning Bod 2006 Hanig 2010 . An exception which learns from raw text and makes no use of POS tags is the common cover links parser CCL Seginer 2007 . CCL established state-of-the-art results for unsupervised constituency pars-1077 ing from raw text and it is also incremental and extremely fast for both learning and parsing. Unfortunately CCL is a non-probabilistic algorithm based on a complex set of inter-relating heuristics and a .

Uyên My 75 10 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models"

10 63 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 461867 55

Giới thiệu :Lập trình mã nguồn mở

14 22643 59

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10892 529

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10066 446

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9519 104

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8281 1125

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8238 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7864 2220

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6687 253

Vật lý hạt cơ bản (1)

29 5770 85

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Đánh giá hao mòn và độ tin cậy của chi tiết và kết cấu trên đầu máy diezel part 3

12 313 0 27-04-2024

Bibliography on Medieval Women, Gender, and Medicine 1980-2009

82 210 0 27-04-2024

Anh văn bằng C-124

8 175 0 27-04-2024

Magnetic Bearings Theory and Applications phần 2

14 172 0 27-04-2024

Báo cáo nghiên cứu khoa học " KẾT QUẢ NGHIÊN CỨU BƯỚC ĐẦU VỀ THIÊN ĐỊCH CHÂN KHỚP TRÊN CÂY THANH TRÀ Ở THỪA THIÊN HUẾ "

7 175 0 27-04-2024

Lịch sử Đội TNTP Hồ Chí Minh - CHƯƠNG III VÂNG LỜI BÁC DẠY, LÀM NGHÌN VIỆC TỐT, CHỐNG MỸ, CỨU NƯỚC, THIẾU NIÊN SĂN SÀNG

45 137 0 27-04-2024

The profit magic of stock Timing The Markets_5

22 119 0 27-04-2024

Đề tài: Tìm hiểu một số yêu cầu đặt ra với một phòng thu âm, để đảm bảo chất lượng âm thanh trong sản phẩm đa phương tiện

8 160 1 27-04-2024

Diseases of the Liver and Biliary System - part 1

33 124 0 27-04-2024

XỬ TRÍ CHẤN THƯƠNG SỌ NÃO KÍN

1 113 1 27-04-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7864 2220

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 5737 1368

Ebook Chào con ba mẹ đã sẵn sàng

112 3767 1231

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5319 1136

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8281 1125

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3499 643

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10892 529

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3684 525

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4046 515

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4128 480