TAILIEUCHUNG - Báo cáo khoa học: "Local Histograms of Character N -grams for Authorship Attribution"

This paper proposes the use of local histograms (LH) over character n-grams for authorship attribution (AA). LHs are enriched histogram representations that preserve sequential information in documents; they have been successfully used for text categorization and document visualization using word histograms. In this work we explore the suitability of LHs over n-grams at the character-level for AA. We show that LHs are particularly helpful for AA, because they provide useful information for uncovering, to some extent, the writing style of authors. We report experimental results in AA data sets that confirm that LHs over character n-grams are more helpful. | Local Histograms of Character N-grams for Authorship Attribution Hugo Jair Escalante Thamar Solorio Graduate Program in Systems Eng. Dept. of Computer and Information Sciences Universidad Autonoma de Nuevo Leon University of Alabama at Birmingham San Nicolas de los Garza NL 66450 Mexico Birmingham AL 35294 USA solorio@ Manuel Montes-y-Gomez Computer Science Department INAOE Tonantzintla Puebla 72840 Mexico Department of Computer and Information Sciences University of Alabama at Birmingham Birmingham AL 35294 USA mmontesg@ Abstract This paper proposes the use of local histograms LH over character n-grams for authorship attribution AA . LHs are enriched histogram representations that preserve sequential information in documents they have been successfully used for text categorization and document visualization using word histograms. In this work we explore the suitability of LHs over n-grams at the character-level for AA. We show that LHs are particularly helpful for AA because they provide useful information for uncovering to some extent the writing style of authors. We report experimental results in AA data sets that confirm that LHs over character n-grams are more helpful for AA than the usual global histograms yielding results far superior to state of the art approaches. We found that LHs are even more advantageous in challenging conditions such as having imbalanced and small training sets. Our results motivate further research on the use of LHs for modeling the writing style of authors for related tasks such as authorship verification and plagiarism detection. 1 Introduction Authorship attribution AA is the task of deciding whom from a set of candidates is the author of a given document Houvardas and Stamatatos 2006 Luyckx and Daelemans 2010 Stamatatos 2009b . There is a broad field of application for AA methods including spam filtering de Vel et al. 2001 288 fraud detection computer forensics Lambers and Veenman 2009 .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.