TAILIEUCHUNG - Báo cáo khoa học: "Finding Hedges by Chasing Weasels: Hedge Detection Using Wikipedia Tags and Shallow Linguistic Features"

We investigate the automatic detection of sentences containing linguistic hedges using corpus statistics and syntactic patterns. We take Wikipedia as an already annotated corpus using its tagged weasel words which mark sentences and phrases as non-factual. We evaluate the quality of Wikipedia as training data for hedge detection, as well as shallow linguistic features. | Finding Hedges by Chasing Weasels Hedge Detection Using Wikipedia Tags and Shallow Linguistic Features Viola Ganter and Michael Strube EML Research gGmbH Heidelberg Germany http nlp Abstract We investigate the automatic detection of sentences containing linguistic hedges using corpus statistics and syntactic patterns. We take Wikipedia as an already annotated corpus using its tagged weasel words which mark sentences and phrases as non-factual. We evaluate the quality of Wikipedia as training data for hedge detection as well as shallow linguistic features. 1 Introduction While most research in natural language processing is dealing with identifying extracting and classifying facts recent years have seen a surge in research on sentiment and subjectivity see Pang Lee 2008 for an overview . However even opinions have to be backed up by facts to be effective as arguments. Distinguishing facts from fiction requires to detect subtle variations in the use of linguistic devices such as linguistic hedges which indicate that speakers do not back up their opinions with facts Lakoff 1973 Hyland 1998 . Many NLP applications could benefit from identifying linguistic hedges . question answering systems Riloff et al. 2003 information extraction from biomedical documents Medlock Briscoe 2007 Szarvas 2008 and deception detection Bachenko et al. 2008 . While NLP research on classifying linguistic hedges has been restricted to analysing biomedical documents the above incomplete list of applications suggests that domain- and languageindependent approaches for hedge detection need to be developed. We investigate Wikipedia as a source of training data for hedge classification. We adopt Wikipedia s notion of weasel words which we argue to be closely related to hedges and private states. Many Wikipedia articles contain a specific weasel tag so that Wikipedia can be viewed as a readily annotated corpus. Based on this data we have built a system to detect sentences that

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.