TAILIEUCHUNG - Báo cáo khoa học: "Authorship Attribution Using Probabilistic Context-Free Grammars"

In this paper, we present a novel approach for authorship attribution, the task of identifying the author of a document, using probabilistic context-free grammars. Our approach involves building a probabilistic context-free grammar for each author and using this grammar as a language model for classification. We evaluate the performance of our method on a wide range of datasets to demonstrate its efficacy. (2008) use a combination of word-level statistics and part-of-speech counts or n-grams. . | Authorship Attribution Using Probabilistic Context-Free Grammars Sindhu Raghavan Adriana Kovashka Raymond Mooney Department of Computer Science The University of Texas at Austin 1 University Station C0500 Austin TX 78712-0233 USA sindhu adriana mooney @ Abstract In this paper we present a novel approach for authorship attribution the task of identifying the author of a document using probabilistic context-free grammars. Our approach involves building a probabilistic context-free grammar for each author and using this grammar as a language model for classification. We evaluate the performance of our method on a wide range of datasets to demonstrate its efficacy. 1 Introduction Natural language processing allows us to build language models and these models can be used to distinguish between languages. In the context of written text such as newspaper articles or short stories the author s style could be considered a distinct language. Authorship attribution also referred to as authorship identification or prediction studies strategies for discriminating between the styles of different authors. These strategies have numerous applications including settling disputes regarding the authorship of old and historically important documents Mosteller and Wallace 1984 automatic plagiarism detection determination of document authenticity in court Juola and Sofko 2004 cyber crime investigation Zheng et al. 2009 and forensics Luyckx and Daelemans 2008 . The general approach to authorship attribution is to extract a number of style markers from the text and use these style markers as features to train a classifier Burrows 1987 Binongo and Smith 1999 Diederich et al. 2000 Holmes and Forsyth 1995 Joachims 1998 Mosteller and Wallace 1984 . These style markers could include the frequencies of certain characters function words phrases or sentences. Peng et al. 2003 build a character-level n-gram model for each author. Sta-matatos et al. 1999 and Luyckx and Daelemans 2008 .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.