TAILIEUCHUNG - Báo cáo khoa học: "Web-Scale Features for Full-Scale Parsing"

Counts from large corpora (like the web) can be powerful syntactic cues. Past work has used web counts to help resolve isolated ambiguities, such as binary noun-verb PP attachments and noun compound bracketings. In this work, we first present a method for generating web count features that address the full range of syntactic attachments. These features encode both surface evidence of lexical affinities as well as paraphrase-based cues to syntactic structure. | Web-Scale Features for Full-Scale Parsing Mohit Bansal and Dan Klein Computer Science Division University of California Berkeley mbansal klein @ Abstract Counts from large corpora like the web can be powerful syntactic cues. Past work has used web counts to help resolve isolated ambiguities such as binary noun-verb PP attachments and noun compound bracketings. In this work we first present a method for generating web count features that address the full range of syntactic attachments. These features encode both surface evidence of lexical affinities as well as paraphrase-based cues to syntactic structure. We then integrate our features into full-scale dependency and constituent parsers. We show relative error reductions of over the second-order dependency parser of McDonald and Pereira 2006 over the constituent parser of Petrov et al. 2006 and over a non-local constituent reranker. 1 Introduction Current state-of-the art syntactic parsers have achieved accuracies in the range of 90 F1 on the Penn Treebank but a range of errors remain. From a dependency viewpoint structural errors can be cast as incorrect attachments even for constituent phrase-structure parsers. For example in the Berkeley parser Petrov et al. 2006 about 20 of the errors are prepositional phrase attachment errors as in Figure 1 where a preposition-headed IN phrase was assigned an incorrect parent in the implied dependency tree. Here the Berkeley parser solid blue edges incorrectly attaches from debt to the noun phrase 30 billion whereas the correct attachment dashed gold edges is to the verb raising. However there are a range of error types as shown in Figure 2. Here a is a non-canonical PP 693 Figure 1 A PP attachment error in the parse output of the Berkeley parser on Penn Treebank . Guess edges are in solid blue gold edges are in dashed gold and edges common in guess and gold parses are in black. attachment ambiguity where by yesterday afternoon should attach to had .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.