TAILIEUCHUNG - Data Mining and Knowledge Discovery Handbook, 2 Edition part 36

Data Mining and Knowledge Discovery Handbook, 2 Edition part 36. Knowledge Discovery demonstrates intelligent computing at its best, and is the most desirable and interesting end-product of Information Technology. To be able to discover and to extract knowledge from data is a task that many researchers and practitioners are endeavoring to accomplish. There is a lot of hidden knowledge waiting to be discovered – this is the challenge created by today’s abundance of data. Data Mining and Knowledge Discovery Handbook, 2nd Edition organizes the most current concepts, theories, standards, methodologies, trends, challenges and applications of data mining (DM) and knowledge discovery. | 330 Bart Goethals since a set that is frequent in the complete database must be relatively frequent in one of the parts. Finally the actual supports of all sets are computed during a second scan through the database. Although the covers of all items can be stored in main memory during the generation of all local frequent sets for every part it is still possible that the covers of all local candidate fc-sets can not be stored in main memory. Also the algorithm is highly dependent on the heterogeneity of the database and can generate too many local frequent sets resulting in a significant decrease in performance. However if the complete database fits into main memory and the total of all covers at any iteration also does not exceed main memory limits then the database must not be partitioned at all and the algorithm essentially comes down to Eclat. Sampling Another technique to solve Apriori s slow counting and Eclat s large memory requirements is to use sampling as proposed by Toivonen Toivonen 1996 . The presented Sampling algorithm picks a random sample from the database then finds all relatively frequent patterns in that sample and then verifies the results with the rest of the database. In the cases where the sampling method does not produce all frequent sets the missing sets can be found by generating all remaining potentially frequent sets and verifying their supports during a second pass through the database. The probability of such a failure can be kept small by decreasing the minimal support threshold. However for a reasonably small probability of failure the threshold must be drastically decreased which can cause a combinatorial explosion of the number of candidate patterns. Nevertheless in practice finding all frequent patterns within a small sample of the database can be done very fast using Eclat or any other efficient frequent set mining algorithm. In the next step all true supports of these patterns must be counted after which the standard .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.