TAILIEUCHUNG - Báo cáo khoa học: "Attacking Parsing Bottlenecks with Unlabeled Data and Relevant Factorizations"

Prepositions and conjunctions are two of the largest remaining bottlenecks in parsing. Across various existing parsers, these two categories have the lowest accuracies, and mistakes made have consequences for downstream applications. Prepositions and conjunctions are often assumed to depend on lexical dependencies for correct resolution. | Attacking Parsing Bottlenecks with Unlabeled Data and Relevant Factorizations Emily Pitler Computer and Information Science University of Pennsylvania Philadelphia PA 19104 epitler@ Abstract Prepositions and conjunctions are two of the largest remaining bottlenecks in parsing. Across various existing parsers these two categories have the lowest accuracies and mistakes made have consequences for downstream applications. Prepositions and conjunctions are often assumed to depend on lexical dependencies for correct resolution. As lexical statistics based on the training set only are sparse unlabeled data can help ameliorate this sparsity problem. By including unlabeled data features into a factorization of the problem which matches the representation of prepositions and conjunctions we achieve a new state-of-the-art for English dependencies with correct attachments on the current standard. Furthermore conjunctions are attached with an accuracy of and prepositions with an accuracy of . 1 Introduction Prepositions and conjunctions are two large remaining bottlenecks in parsing. Across various existing parsers these two categories have the lowest accuracies and mistakes made on these have consequences for downstream applications. Machine translation is sensitive to parsing errors involving prepositions and conjunctions because in some languages different attachment decisions in the parse of the source language sentence produce different translations. Preposition attachment mistakes are particularly bad when translating into Japanese Schwartz et al. 2003 which uses a different postposition for different attachments conjunction mis 768 takes can cause word ordering mistakes when translating into Chinese Huang 1983 . Prepositions and conjunctions are often assumed to depend on lexical dependencies for correct resolution Jurafsky and Martin 2008 . However lexical statistics based on the training set only are typically sparse and have only a small

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.