TAILIEUCHUNG - Data Streams Models and Algorithms- P11

Data Streams Models and Algorithms- P11: In recent years, the progress in hardware technology has made it possible for organizations to store and record large streams of transactional data. Such data sets which continuously and rapidly grow over time are referred to as data streams. In addition, the development of sensor technology has resulted in the possibility of monitoring many events in real time. | 290 DATA STREAMS MODELSAND ALGORITHMS sensors credit card transactions or from networked systems. To benefit from these enhanced data collecting capabilities it is clear that semi-automated interactive techniques such as data mining should be employed to process and analyze the data. It is also desirable to have interactive response times to client queries as the process is often iterative in nature with a human in the loop . The challenges to meet these criteria are often daunting as detailed next. Although inexpensive storage space makes it possible to maintain vast volumes of data accessing and managing the data becomes a performance issue. Often one finds that a single node is incapable of housing such large datasets. Efficient and adaptive techniques for data access data storage and communication if the data sources are distributed are thus necessary. Moreover data mining becomes more complicated in the context of dynamic databases where there is a constant influx of data. Changes in the data can invalidate existing patterns or introduce new ones. Re-executing the algorithms from scratch leads to large computational and I O overheads. These two factors have led to the development of distributed algorithms for analyzing streaming data which is the focus of this survey article. Many systems use a centralized model for mining multiple data streams 2 . Under this model the distributed data streams are directed to one central location before they are mined. A schematic diagram of a centralized data stream mining system is presented in Figure . Such a model of computation is limited in several respects. First centralized mining of data streams can result in long response time. While distributed computing resources may be available they are not fully utilized. Second central collection of data can result in heavy traffic over critical communication links. If these communication links have limited network bandwidth network VO may become a performance bottleneck. .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.