TAILIEUCHUNG - Báo cáo khoa học: "Profiting from Mark-Up: Hyper-Text Annotations for Guided Parsing'

We show how web mark-up can be used to improve unsupervised dependency parsing. Starting from raw bracketings of four common HTML tags (anchors, bold, italics and underlines), we refine approximate partial phrase boundaries to yield accurate parsing constraints. Conversion procedures fall out of our linguistic analysis of a newly available million-word hyper-text corpus. We demonstrate that derived constraints aid grammar induction by training Klein and Manning’s Dependency Model with Valence (DMV) on this data set: parsing accuracy on Section 23 (all sentences) of the Wall Street Journal corpus jumps to , beating previous state-of-theart by more than 5%. . | Profiting from Mark-Up Hyper-Text Annotations for Guided Parsing Valentin I. Spitkovsky Computer Science Department Stanford University and Google Inc. valentin@ Daniel Jurafsky Departments of Linguistics and Computer Science Stanford University jurafsky@ Hiyan Alshawi Google Inc. hiyan@ Abstract We show how web mark-up can be used to improve unsupervised dependency parsing. Starting from raw bracketings of four common HTML tags anchors bold italics and underlines we refine approximate partial phrase boundaries to yield accurate parsing constraints. Conversion procedures fall out of our linguistic analysis of a newly available million-word hyper-text corpus. We demonstrate that derived constraints aid grammar induction by training Klein and Manning s Dependency Model with Valence DMV on this data set parsing accuracy on Section 23 all sentences of the Wall Street Journal corpus jumps to beating previous state-of-the-art by more than 5 . Web-scale experiments show that the DMV perhaps because it is unlexicalized does not benefit from orders of magnitude more annotated but noisier data. Our model trained on a single blog generalizes to accuracy out-of-domain against the Brown corpus nearly 10 higher than the previous published best. The fact that web mark-up strongly correlates with syntactic structure may have broad applicability in NLP. 1 Introduction Unsupervised learning of hierarchical syntactic structure from free-form natural language text is a hard problem whose eventual solution promises to benefit applications ranging from question answering to speech recognition and machine translation. A restricted version of this problem that targets dependencies and assumes partial annotation sentence boundaries and part-of-speech POS tagging has received much attention. Klein and Manning 2004 were the first to beat a simple parsing heuristic the right-branching baseline today s state-of-the-art systems Headden et al. 2009 Cohen

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.