TAILIEUCHUNG - Báo cáo khoa học: "Obfuscating Document Stylometry to Preserve Author Anonymity"

This paper explores techniques for reducing the effectiveness of standard authorship attribution techniques so that an author A can preserve anonymity for a particular document D. We discuss feature selection and adjustment and show how this information can be fed back to the author to create a new document D’ for which the calculated attribution moves away from A. Since it can be labor intensive to adjust the document in this fashion, we attempt to quantify the amount of effort required to produce the anonymized document and introduce two levels of anonymization: shallow and deep. . | Obfuscating Document Stylometry to Preserve Author Anonymity Gary Kacmarcik Michael Gamon Natural Language Processing Group Microsoft Research Redmond WA USA garykac mgamon @ Abstract This paper explores techniques for reducing the effectiveness of standard authorship attribution techniques so that an author A can preserve anonymity for a particular document D. We discuss feature selection and adjustment and show how this information can be fed back to the author to create a new document D for which the calculated attribution moves away from A. Since it can be labor intensive to adjust the document in this fashion we attempt to quantify the amount of effort required to produce the anonymized document and introduce two levels of anonymization shallow and deep. In our test set we show that shallow anonymization can be achieved by making 14 changes per 1000 words to reduce the likelihood of identifying A as the author by an average of more than 83 . For deep anonymization we adapt the unmasking work of Koppel and Schler to provide feedback that allows the author to choose the level of anonymization. 1 Introduction Authorship identification has been a long standing topic in the field of stylometry the analysis of literary style Holmes 1998 . Issues of style genre and authorship are an interesting sub-area of text categorization. In authorship detection it is not the topic of a text but rather the stylistic properties that are of interest. The writing style of a particular author can be identified by analyzing the form of the writing rather than the content. The analysis of style therefore needs to ab stract away from the content and focus on the content-independent form of the linguistic expressions in a text. Advances in authorship attribution have raised concerns about whether or not authors can truly maintain their anonymity Rao and Rohatgi 2000 . While there are clearly many reasons for wanting to unmask an anonymous author notably law enforcement and

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.