TAILIEUCHUNG - Báo cáo khoa học: "Lexicon acquisition with a large-coverage unification-based grammar"

We describe how unknown lexical entries are processed in a unification-based framework with large-coverage grammars and how from their usage lexical entries are extracted. To keep the time and space usage during parsing within bounds, information from external sources like Part of Speech (PoS) taggers and morphological analysers is taken into account when information is constructed for unknown words. | Lexicon acquisition with a large-coverage unification-based grammar Frederik Fouvry Computational Linguistics Saarland University PO Box 15 11 50 D-66041 Saarbnicken Germany fouvry@ Abstract We describe how unknown lexical entries are processed in a unification-based framework with large-coverage grammars and how from their usage lexical entries are extracted. To keep the time and space usage during parsing within bounds information from external sources like Part of Speech PoS taggers and morphological analysers is taken into account when information is constructed for unknown words. 1 Introduction For Natural Language Processing NLP in general and processing with linguistically rich frameworks more specifically unknown words are a problem. The following gives an idea of the extent of the problem. In an evaluation of a large-scale grammar for unrestricted text on a newspaper corpus we found that the number of failed parses due to unknown words accounted for around 89 of the total number of unsuccessful analyses. Even though this figure does not say anything about the grammar these failures may be hiding many others it shows the importance of the problem. For unification-based implementations which often refer to linguistic theories and are therefore rich in information one approach to deal with unknown words has been proposed several times to exploit the syntactic context of completed analyses to collect information about a new word. A few implementations have been developed to demonstrate the feasibility of the technique but to our knowledge it has not been applied yet to large-coverage grammars. In this note we discuss how we are applying it to such a grammar for unrestricted text. Starting from this standard technique we extend it and integrate PoS and morphological information originating from external resources. We will first describe the method of learning information from the syntactic context. Then we discuss the current results of our .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.