TAILIEUCHUNG - Advances in Database Technology- P4

Tham khảo tài liệu 'advances in database technology- p4', công nghệ thông tin, cơ sở dữ liệu phục vụ nhu cầu học tập, nghiên cứu và làm việc hiệu quả | 132 P. Andritsos et al. Table 3. The director attribute director g-T g-c P d Scorsese 1 2 0 0 0 1 2 0 1 6 Coppola 1 2 0 0 0 1 2 0 1 6 Hitchcock 0 1 3 1 3 0 2 3 0 2 6 Koster 0 1 3 1 3 0 0 2 3 2 6 Formally let A be the attribute of interest and let A denote the set of values of attribute A . Also let A A A denote the set of attribute values for the remaining attributes. For the example of the movie database if A is the director attribute with A dXJ dJd dJQAhenA a gXJr . Let A and A be random variables that range over A and A respectively and let p A v denote the distribution that value v A induces on the values in A. For some a G A. p a tt is the fraction of the tuples in T that contain v and also contain value a. Also for some v G A p y is the fraction of tuples in T that contain the value v. Table 3 shows an example of a table when is the director attribute. For two values we define the distance between and to be the information loss incurred about the variable if we merge values and 2- This is equal to the increase in the uncertainty of predicting the values of variable A when we replace values i and v2 with tq V v2. In the movie example Scorsese and Coppola are the most similar The definition of a distance measure for categorical attribute values is a contribution in itself since it imposes some structure on an inherently unstructured problem. We can define a distance measure between tuples as the sum of the distances of the individual attributes. Another possible application is to cluster intra-attribute values. For example in a movie database we may be interested in discovering clusters of directors or actors which in turn could help in improving the classification of movie tuples. Given the joint distribution of random variables A and A we can apply the LIMBO algorithm for clustering the values of attribute Merging two produces a new value vi V v2 where p i V v2 P i p 2 since and t 2 never appear together. .

TỪ KHÓA LIÊN QUAN
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.