TAILIEUCHUNG - Báo cáo khoa học: "Linguistically Motivated Large-Scale NLP with C&C and Boxer"

The statistical modelling of language, together with advances in wide-coverage grammar development, have led to high levels of robustness and efficiency in NLP systems and made linguistically motivated large-scale language processing a possibility (Matsuzaki et al., 2007; Kaplan et al., 2004). This paper describes an NLP system which is based on syntactic and semantic formalisms from theoretical linguistics, and which we have used to analyse the entire Gigaword corpus (1 billion words) in less than 5 days using only 18 processors. . | Linguistically Motivated Large-Scale NLP with C C and Boxer James R. Curran Stephen Clark School of Information Technologies Computing Laboratory Oxford University University of Sydney NSW 2006 Australia james@ Johan Bos Dipartimento di Informatica Universita di Roma La Sapienza Wolfson Building Parks Road Oxford OX1 3QD UK via Salaria 113 00198 Roma Italy bos@ 1 Introduction The statistical modelling of language together with advances in wide-coverage grammar development have led to high levels of robustness and efficiency in NLP systems and made linguistically motivated large-scale language processing a possibility Mat-suzaki et al. 2007 Kaplan et al. 2004 . This paper describes an NLP system which is based on syntactic and semantic formalisms from theoretical linguistics and which we have used to analyse the entire Gigaword corpus 1 billion words in less than 5 days using only 18 processors. This combination of detail and speed of analysis represents a breakthrough in NLP technology. The system is built around a wide-coverage Combinatory Categorial Grammar CCG parser Clark and Curran 2004b . The parser not only recovers the local dependencies output by treebank parsers such as Collins 2003 but also the long-range dep-dendencies inherent in constructions such as extraction and coordination. CCG is a lexicalized grammar formalism so that each word in a sentence is assigned an elementary syntactic structure in CCG s case a lexical category expressing subcategorisation information. Statistical tagging techniques can assign lexical categories with high accuracy and low ambiguity Curran et al. 2006 . The combination of finite-state supertagging and highly engineered C leads to a parser which can analyse up to 30 sentences per second on standard hardware Clark and Curran 2004a . The C C tools also contain a number of Maximum Entropy taggers including the CCG supertagger a POS tagger Curran and Clark 2003a .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.