TAILIEUCHUNG - Data Mining and Knowledge Discovery Handbook, 2 Edition part 66

Data Mining and Knowledge Discovery Handbook, 2 Edition part 66. Knowledge Discovery demonstrates intelligent computing at its best, and is the most desirable and interesting end-product of Information Technology. To be able to discover and to extract knowledge from data is a task that many researchers and practitioners are endeavoring to accomplish. There is a lot of hidden knowledge waiting to be discovered – this is the challenge created by today’s abundance of data. Data Mining and Knowledge Discovery Handbook, 2nd Edition organizes the most current concepts, theories, standards, methodologies, trends, challenges and applications of data mining (DM) and knowledge discovery. | 630 Maria Halkidi and Michalis Vazirgiannis that increase or decrease as the number of clusters increase we search for the values of nc at which a significant local change in value of the index occurs. This change appears as a knee in the plot and it is an indication of the number of clusters underlying the data set. Moreover the absence of a knee may be an indication that the data set possesses no clustering structure. Below some representative validity indices for crisp and fuzzy clustering are presented. Crisp Clustering Crisp clustering considers non overlapping partitions meaning that a data point either belongs to a class or not. In this section we discuss validity indices suitable for crisp clustering. The modified Hubert r statistic The definition of the modified Hubert r Theodoridis and Koutroubas 1999 statistic is given by the equation N 1 N r 1 M P i j Q i j i 1 j i 1 where N is the number of objects in a data set M N N 1 2 P is the proximity matrix of the data set and Q is an N x N matrix whose i j element is equal to the distance between the representative points vci vcj of the clusters where the objects Xi and xj belong. Similarly we can define the normalized Hubert o statistic given by equation . 1 m N i 1 P i j vp Q i j to r Op Oq where pP. q op Oq are the respective means and variances of P Q matrices. If the d vCi vCj is close to d xi Xj for i j 1 2 . N P and Q will be in close agreement and the values of r and f normalized r will be high. Conversely a high value of r t indicates the existence of compact clusters. Thus in the plot of normalized r versus nc we seek a significant knee that corresponds to a significant increase of normalized G. The number of clusters at which the knee occurs is an indication of the number of clusters that occurs in the data. We note that for nc 1 and nc N the index is not defined. Dunn family of indices A cluster validity index for crisp clustering proposed in Dunn 1974 aims at the identification of .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.