TAILIEUCHUNG - Báo cáo khoa học: "You Can’t Beat Frequency (Unless You Use Linguistic Knowledge) – A Qualitative Evaluation of Association Measures for Collocation and Term Extraction"

In the past years, a number of lexical association measures have been studied to help extract new scientific terminology or general-language collocations. The implicit assumption of this research was that newly designed term measures involving more sophisticated statistical criteria would outperform simple counts of cooccurrence frequencies. We here explicitly test this assumption. By way of four qualitative criteria, we show that purely statistics-based measures reveal virtually no difference compared with frequency of occurrence counts, while linguistically more informed metrics do reveal such a marked difference. . | You Can t Beat Frequency Unless You Use Linguistic Knowledge -A Qualitative Evaluation of Association Measures for Collocation and Term Extraction Joachim Wermter Udo Hahn Jena University Language Information Engineering JULIE Lab D-07743 Jena Germany wermter hahn @ Abstract In the past years a number of lexical association measures have been studied to help extract new scientific terminology or general-language collocations. The implicit assumption of this research was that newly designed term measures involving more sophisticated statistical criteria would outperform simple counts of cooccurrence frequencies. We here explicitly test this assumption. By way of four qualitative criteria we show that purely statistics-based measures reveal virtually no difference compared with frequency of occurrence counts while linguistically more informed metrics do reveal such a marked difference. 1 Introduction Research on domain-specific automatic term recognition ATR and on general-language collocation extraction CE has gone mostly separate ways in the last decade although their underlying procedures and goals turn out to be rather similar. In both cases linguistic filters POS taggers phrase chunkers shallow parsers initially collect candidates from large text corpora and then frequency- or statistics-based evidence or association measures yield scores indicating to what degree a candidate qualifies as a term or a collocation. While term mining and collocation mining as a whole involve almost the same analytical processing steps such as orthographic and morphological normalization normalization of term or collocation variation etc. it is exactly the measure which grades termhood or collocativity of a candidate on which alternative approaches diverge. Still the output of such mining algorithms look similar. It is typically constituted by a ranked list on which ideally the true terms or collocations are placed in the top portion of the list while the non-terms

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.