TAILIEUCHUNG - Báo cáo khoa học về: 'Parsing and Subcategorization Data'

In this paper, we compare the performance of a state-of-the-art statistical parser (Bikel, 2004) in parsing written and spoken language and in generating subcategorization cues from written and spoken language. Although Bikel’s parser achieves a higher accuracy for parsing written language, it achieves a higher accuracy when extracting subcategorization cues from spoken language. Our experiments also show that current technology for extracting subcategorization frames initially designed for written texts works equally well for spoken language. . | Parsing and Subcategorization Data Jianguo Li and Chris Brew Department of Linguistics The Ohio State University Columbus OH USA j ianguo cbrew @ Abstract In this paper we compare the performance of a state-of-the-art statistical parser Bikel 2004 in parsing written and spoken language and in generating subcategorization cues from written and spoken language. Although Bikel s parser achieves a higher accuracy for parsing written language it achieves a higher accuracy when extracting subcategorization cues from spoken language. Our experiments also show that current technology for extracting subcategorization frames initially designed for written texts works equally well for spoken language. Additionally we explore the utility of punctuation in helping parsing and extraction of subcategorization cues. Our experiments show that punctuation is of little help in parsing spoken language and extracting subcategorization cues from spoken language. This indicates that there is no need to add punctuation in transcribing spoken corpora simply in order to help parsers. 1 Introduction Robust statistical syntactic parsers made possible by new statistical techniques Collins 1999 Charniak 2000 Bikel 2004 and by the availability of large hand-annotated training corpora such as WSJ Marcus et al. 1993 and Switchboard Godefrey et al. 1992 have had a major impact on the field of natural language processing. There are many ways to make use of parsers output. One particular form of data that can be extracted from parses is information about subcategorization. Subcategorization data comes in two forms subcategorization frame SCF and subcategorization cue SCC . SCFs differ from SCCs in that SCFs contain only arguments while SCCs contain both arguments and adjuncts. Both SCFs and SCCs have been crucial to NLP tasks. For example SCFs have been used for verb disambiguation and classification Schulte im Walde 2000 Merlo and Stevenson 2001 Lapata and Brew 2004 Merlo et al. .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.