TAILIEUCHUNG - Báo cáo khoa học: "Online Generation of Locality Sensitive Hash Signatures"

Motivated by the recent interest in streaming algorithms for processing large text collections, we revisit the work of Ravichandran et al. (2005) on using the Locality Sensitive Hash (LSH) method of Charikar (2002) to enable fast, approximate comparisons of vector cosine similarity. For the common case of feature updates being additive over a data stream, we show that LSH signatures can be maintained online, without additional approximation error, and with lower memory requirements than when using the standard offline technique. . | Online Generation of Locality Sensitive Hash Signatures Benjamin Van Durme HLTCOE Johns Hopkins University Baltimore MD 21211 USA Ashwin Lall College of Computing Georgia Institute of Technology Atlanta GA 30332 USA Abstract Motivated by the recent interest in streaming algorithms for processing large text collections we revisit the work of Ravichandran et al. 2005 on using the Locality Sensitive Hash LSH method of Charikar 2002 to enable fast approximate comparisons of vector cosine similarity. For the common case of feature updates being additive over a data stream we show that LSH signatures can be maintained online without additional approximation error and with lower memory requirements than when using the standard offline technique. 1 Introduction There has been a surge of interest in adapting results from the streaming algorithms community to problems in processing large text collections. The term streaming refers to a model where data is made available sequentially and it is assumed that resource limitations preclude storing the entirety of the data for offline batch processing. Statistics of interest are approximated via online randomized algorithms. Examples of text applications include collecting approximate counts Talbot 2009 Van Durme and Lall 2009a finding top-n elements Goyal et al. 2009 estimating term co-occurrence Li et al. 2008 adaptive language modeling Levenberg and Osborne 2009 and building top-k ranklists based on pointwise mutual information Van Durme and Lall 2009b . Here we revisit the work of Ravichandran et al. 2005 on building word similarity measures from large text collections by using the Locality Sensitive Hash LSH method of Charikar 2002 . For the common case of feature updates being additive over a data stream such as when tracking lexical co-occurrence we show that LSH signatures can be maintained online without additional approximation error and with lower memory requirements than when using the standard offline technique. We .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.