Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "A Novel Burst-based Text Representation Model for Scalable Event Detection"

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ

Mining retrospective events from text streams has been an important research topic. Classic text representation model (i.e., vector space model) cannot model temporal aspects of documents. To address it, we proposed a novel burst-based text representation model, denoted as BurstVSM. BurstVSM corresponds dimensions to bursty features instead of terms, which can capture semantic and temporal information. | A Novel Burst-based Text Representation Model for Scalable Event Detection Wayne Xin Zhao Rishan Chen Kai Fan Hongfei Yan 2 and Xiaoming Litt School of Electronics Engineering and Computer Science Peking University China State Key Laboratory of Software Beihang University China batmanfly tsunamicrs fankaicn yhf1029 @gmail.com lxm@pku.edu.cn Abstract Mining retrospective events from text streams has been an important research topic. Classic text representation model i.e. vector space model cannot model temporal aspects of documents. To address it we proposed a novel burst-based text representation model denoted as BurstVSM. BurstVSM corresponds dimensions to bursty features instead of terms which can capture semantic and temporal information. Meanwhile it significantly reduces the number of non-zero entries in the representation. We test it via scalable event detection and experiments in a 10-year news archive show that our methods are both effective and efficient. 1 Introduction Mining retrospective events Yang et al. 1998 Fung et al. 2007 Allan et al. 2000 has been quite an important research topic in text mining. One standard way for that is to cluster news articles as events by following a two-step approach Yang et al. 1998 1 represent document as vectors and calculate similarities between documents 2 run the clustering algorithm to obtain document clusters as events.1 Underlying text representation often plays a critical role in this approach especially for long text streams. In this paper our focus is to study how to represent temporal documents effectively for event detection. Classical text representation methods i.e. Vector Space Model VSM have a few shortcomings when dealing with temporal documents. The major one is that it maps one dimension to one term which completely ignores temporal information and therefore VSM can never capture the evolving trends in text streams. See the example in Figure 1 D1 and D2 Corresponding author. 1Post-processing may be .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.