TAILIEUCHUNG - Báo cáo khoa học: "A language−independent shallow−parser Compiler"

We present a rule−based shallow− parser compiler, which allows to generate a robust shallow−parser for any language, even in the absence of training data, by resorting to a very limited number of rules which aim at identifying constituent boundaries. We contrast our approach to other approaches used for shallow−parsing (. finite−state and probabilistic methods). We present an evaluation of our tool for English (Penn Treebank) and for French (newspaper corpus "LeMonde") for several tasks (NP−chunking & "deeper" parsing) . . | A language-independent shallow-parser Compiler Alexandra Kinyon CIS Dpt. . University of Pennsylvania kinyon@ http kinyon Abstract We present a rule-based shallowparser compiler which allows to generate a robust shallow-parser for any language even in the absence of training data by resorting to a very limited number of rules which aim at identifying constituent boundaries. We contrast our approach to other approaches used for shallow-parsing . finite-state and probabilistic methods . We present an evaluation of our tool for English Penn Treebank and for French newspaper corpus LeMonde for several tasks NP-chunking deeper parsing . 1 Introduction Full syntactic parsers of unrestricted text are costly to develop costly to run and often yield errors because of lack of robustness of wide-coverage grammars and problems of attachment. This has led as early as 1958 Joshi Hopely 97 to the development of shallow-parsers which aim at identifying as quickly and accurately as possible main constituents and possibly syntactic functions in an input without dealing with the most difficult problems encountered with full-parsing . Hence shallow-parsers are very practical tools. There are two main techniques used to develop shallow-parsers 1- Probabilistic techniques . Magerman 94 Ratnaparkhi 97 Daelmans al. 99 2- Finite-state techniques . Grefenstette 96 Probabilistic techniques require large amounts of syntactically-annotated training data1 which makes them very unsuitable for languages for which no such data is available . most languages except English and also they are not domain-independent nor style-independent . they do not allow to successfully shallowparse speech if no annotated data is available for that style . Finally a shallow-parser developed using these techniques will have to mirror the information contained in the training data. For instance if one trains such a tool on data were only non recursive NP chunks are .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.