TAILIEUCHUNG - Báo cáo khoa học: "Towards the Orwellian Nightmare"

This paper describes the largest scale annotation project involving the Enron email corpus to date. Over 12,500 emails were classified, by humans, into the categories “Business” and “Personal”, and then subcategorised by type within these categories. The paper quantifies how well humans perform on this task (evaluated by inter-annotator agreement). It presents the problems experienced with the separation of these language types. As a final section, the paper presents preliminary results using a machine to perform this classification task. . | Towards the Orwellian Nightmare Separation of Business and Personal Emails Sanaz Jabbari Ben Allison David Guthrie Louise Guthrie Department of Computer Science University of Sheffield 211 Portobello St. Sheffield S1 4DP @ Abstract This paper describes the largest scale annotation project involving the Enron email corpus to date. Over 12 500 emails were classified by humans into the categories Business and Personal and then subcategorised by type within these categories. The paper quantifies how well humans perform on this task evaluated by inter-annotator agreement . It presents the problems experienced with the separation of these language types. As a final section the paper presents preliminary results using a machine to perform this classification task. 1 Introduction Almost since it became a global phenomenon computers have been examining and reasoning about our email. For the most part this intervention has been well natured and helpful - computers have been trying to protect us from attacks of unscrupulous blanket advertising mail shots. However the use of computers for more nefarious surveillance of email has so far been limited. The sheer volume of email sent means even government agencies who can legally intercept all mail must either filter email by some preconceived notion of what is interesting or they must employ teams of people to manually sift through the volumes of data. For example the NSA has had massive parallel machines filtering e-mail traffic for at least ten years. The task of developing such automatic filters at research institutions has been almost impossible but for the opposite reason. There is no shortage of willing researchers but progress has been hampered by the lack of any data - one s email is often hugely private and the prospect of surrendering it in its entirety for research purposes is somewhat unsavoury. Recently a data resource has become available where exactly this .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.