TAILIEUCHUNG - Data Mining and Knowledge Discovery Handbook, 2 Edition part 30

Data Mining and Knowledge Discovery Handbook, 2 Edition part 30. Knowledge Discovery demonstrates intelligent computing at its best, and is the most desirable and interesting end-product of Information Technology. To be able to discover and to extract knowledge from data is a task that many researchers and practitioners are endeavoring to accomplish. There is a lot of hidden knowledge waiting to be discovered – this is the challenge created by today’s abundance of data. Data Mining and Knowledge Discovery Handbook, 2nd Edition organizes the most current concepts, theories, standards, methodologies, trends, challenges and applications of data mining (DM) and knowledge discovery. | 270 Lior Rokach Clustering of objects is as ancient as the human need for describing the salient characteristics of men and objects and identifying them with a type. Therefore it embraces various scientific disciplines from mathematics and statistics to biology and genetics each of which uses different terms to describe the topologies formed using this analysis. From biological taxonomies to medical syndromes and genetic genotypes to manufacturing group technology the problem is identical forming categories of entities and assigning individuals to the proper groups within it. Distance Measures Since clustering is the grouping of similar instances objects some sort of measure that can determine whether two objects are similar or dissimilar is required. There are two main type of measures used to estimate this relation distance measures and similarity measures. Many clustering methods use distance measures to determine the similarity or dissimilarity between any pair of objects. It is useful to denote the distance between two instances x and xj as d xi xj . A valid distance measure should be symmetric and obtains its minimum value usually zero in case of identical vectors. The distance measure is called a metric distance measure if it also satisfies the following properties 1. Triangle inequality d xi xk d xi xj- d xj xk Gxi xj xk G S. 2. d xi xj 0 x xj ixi xj G S. Minkowski Distance Measures for Numeric Attributes Given two p-dimensional instances x x i xi2 . xip and xj xj1 xj2 . xjp The distance between the two data instances can be calculated using the Minkowski metric Han and Kamber 2001 d xi xj xii -xji g xi2 -xj2 g . xip -xjp g 1 g The commonly used Euclidean distance between two objects is achieved when g 2. Given g 1 the sum of absolute paraxial distances Manhattan metric is obtained and with g one gets the greatest of the paraxial distances Chebychev metric . The measurement unit used can affect the clustering analysis. To avoid the dependence on

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.