TAILIEUCHUNG - Open Domain Event Extraction from Twitter

Maliciously modified devices are already a reality. In 2006, Apple shipped iPods infected with the RavMonE virus [4]. During the cold war, the CIA sabotaged oil pipeline control software, which was then allowed to be “stolen” by Russian spies [10]. Conversely, Russian agents intercepted and modified typewriters which were to be used at the US embassy in Moscow; the modifica- tions allowed the Russians to copy any documents typed on said typewriters [16]. Recently, external hard drives sold by Seagate in Taiwan were shipped with a trojan in- stalled that sent personal data to a remote attacker [1]. Although none of these attacks use malicious circuits, they clearly show the feasibility of. | Open Domain Event Extraction from Twitter Alan Ritter University of Washington Computer Sci. Eng. Seattle WA aritter@ Mausam University of Washington Computer Sci. Eng. Seattle WA mausam@ Sam Clark Decide Inc. Seattle WA Oren Etzioni University of Washington Computer Sci. Eng. Seattle WA etzioni@ ABSTRACT Tweets are the most up-to-date and inclusive stream of information and commentary on current events but they are also fragmented and noisy motivating the need for systems that can extract aggregate and categorize important events. Previous work on extracting structured representations of events has focused largely on newswire text Twitter s unique characteristics present new challenges and opportunities for open-domain event extraction. This paper describes TwiCal the first open-domain event-extraction and categorization system for Twitter. We demonstrate that accurately extracting an open-domain calendar of significant events from Twitter is indeed feasible. In addition we present a novel approach for discovering important event categories and classifying extracted events based on latent variable models. By leveraging large volumes of unlabeled data our approach achieves a 14 increase in maximum F1 over a supervised baseline. A continuously updating demonstration of our system can be viewed at http Our NLP tools are available at http aritter twitter_nlp. Categories and Subject Descriptors Natural Language Processing Language parsing and understanding Database Management Database applications data mining General Terms Algorithms Experimentation 1. INTRODUCTION Social networking sites such as Facebook and Twitter present the most up-to-date information and buzz about current This work was conducted at the University of Washington Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee .

TÀI LIỆU MỚI ĐĂNG
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.