Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
We investigate authorship attribution using classifiers based on frame semantics. The purpose is to discover whether adding semantic information to lexical and syntactic methods for authorship attribution will improve them, specifically to address the difficult problem of authorship attribution of translated texts. Our results suggest (i) that frame-based classifiers are usable for author attribution of both translated and untranslated texts; (ii) that framebased classifiers generally perform worse than the baseline classifiers for untranslated texts, but (iii) perform as well as, or superior to the baseline classifiers on translated texts; (iv) that—contrary to current belief—naïve classifiers based on lexical markers. | Lost in Translation Authorship Attribution using Frame Semantics Steffen Hedegaard Department of Computer Science University of Copenhagen Njalsgade 128 2300 Copenhagen S Denmark steffenh@diku.dk Jakob Grue Simonsen Department of Computer Science University of Copenhagen Njalsgade 128 2300 Copenhagen S Denmark simonsen@diku.dk Abstract We investigate authorship attribution using classifiers based on frame semantics. The purpose is to discover whether adding semantic information to lexical and syntactic methods for authorship attribution will improve them specifically to address the difficult problem of authorship attribution of translated texts. Our results suggest i that frame-based classifiers are usable for author attribution of both translated and untranslated texts ii that framebased classifiers generally perform worse than the baseline classifiers for untranslated texts but iii perform as well as or superior to the baseline classifiers on translated texts iv that contrary to current belief naive classifiers based on lexical markers may perform tolerably on translated texts if the combination of author and translator is present in the training set of a classifier. 1 Introduction Authorship attribution is the following problem For a given text determine the author of said text among a list of candidate authors. Determining authorship is difficult and a host of methods have been proposed As of 1998 Rudman estimated the number of metrics used in such methods to be at least 1000 Rudman 1997 . For comprehensive recent surveys see e.g. Juola 2006 Koppel et al. 2008 Stamatatos 2009 . The process of authorship attribution consists of selecting markers features that provide an indication of the author and classifying a text by assigning it to an author using some appropriate machine learning technique. 65 1.1 Attribution of translated texts In contrast to the general authorship attribution problem the specific problem of attributing translated texts to their original .