TAILIEUCHUNG - Báo cáo khoa học: "Automatic Authorship Attribution"

In this paper we present an approach to automatic authorship attribution dealing with real-world (or unrestricted) text. Our method is based on the computational analysis of the input text using a text-processing tool. Besides the style markers relevant to the output of this tool we also use analysis-dependent style markers, that is, measures that represent the way in which the text has been processed. No word frequency counts, nor other lexically-based measures are taken into account. We show that the proposed set of style markers is able to distinguish texts of various authors of a weekly newspaper using multiple. | Proceedings of EACL 99 Automatic Authorship Attribution E. Stamatatos N. Fakotakis and G. Kokkinakis Dept of Electrical and Computer Engineering University of Patras 26500 - Patras Greece stamatatos@ Abstract In this paper we present an approach to automatic authorship attribution dealing with real-world or unrestricted text. Our method is based on the computational analysis of the input text using a text-processing tool. Besides the style markers relevant to the output of this tool we also use analysis-dependent style markers that is measures that represent the way in which the text has been processed. No word frequency counts nor other lexically-based measures are taken into account. We show that the proposed set of style markers is able to distinguish texts of various authors of a weekly newspaper using multiple regression. All the experiments we present were performed using real-world text downloaded from the World Wide Web. Our approach is easily trainable and fully-automated requiring no manual text preprocessing nor sampling. 1 Introduction The vast majority of the attempts to computer-assisted authorship attribution has been focused on literary texts. In particular a lot of attention has been paid to the establishment of the authorship of anonymous or doubtful texts. A typical paradigm is the case of the Federalist papers twelve of which are of disputed authorship Mosteller and Wallace 1984 Holmes and Forsyth 1995 . Moreover the lack of a generic and formal definition of the idiosyncratic style of an author has led to the employment of statistical methods . discriminant analysis principal components etc. . Nowadays the wealth of text available in the World Wide Web in electronic form for a wide variety of genres and languages as well as the development of reliable text-processing tools open the way for the solution of the authorship attribution problem as regards real-world text. The most important approaches to authorship attribution .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.