TAILIEUCHUNG - Báo cáo khoa học: "Linguistic Profiling for Author Recognition and Verification"

A new technique is introduced, linguistic profiling, in which large numbers of counts of linguistic features are used as a text profile, which can then be compared to average profiles for groups of texts. The technique proves to be quite effective for authorship verification and recognition. The best parameter settings yield a False Accept Rate of at a False Reject Rate equal to zero for the verification task on a test corpus of student essays, and a 2-way recognition accuracy on the same corpus. . | Linguistic Profiling for Author Recognition and Verification Hans van Halteren Language and Speech Univ. of Nijmegen . Box 9103 NL-6500 HD Nijmegen The Netherlands hvh@ Abstract A new technique is introduced linguistic profiling in which large numbers of counts of linguistic features are used as a text profile which can then be compared to average profiles for groups of texts. The technique proves to be quite effective for authorship verification and recognition. The best parameter settings yield a False Accept Rate of at a False Reject Rate equal to zero for the verification task on a test corpus of student essays and a 2-way recognition accuracy on the same corpus. 1 Introduction There are several situations in language research or language engineering where we are in need of a specific type of extra-linguistic information about a text document and we would like to determine this information on the basis of linguistic properties of the text. Examples are the determination of the language variety or genre of a text or a classification for document routing or information retrieval. For each of these applications techniques have been developed focusing on specific aspects of the text often based on frequency counts of functions words in linguistics and of content words in language engineering. In the technique we are introducing in this paper linguistic profiling we make no a priori choice for a specific type of word or more complex feature to be counted. Instead all possible features are included and it is determined by the statistics for the texts under consideration and the distinction to be made how much weight if any each feature is to receive. Furthermore the frequency counts are not used as absolute values but rather as deviations from a norm which is again determined by the situation at hand. Our hypothesis is that this technique can bring a useful contribution to all tasks where it is necessary to distinguish one group of texts from .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.