Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
Phân tích như vậy có thể giúp cung cấp cho chúng tôi với một sự hiểu biết tốt hơn của các dữ liệu ở lớn. Trong khi đó, phân loại dự đoán các nhãn phân loại (rời rạc, không có thứ tự), các mô hình dự báo continuousvalued chức năng. Ví dụ, chúng ta có thể xây dựng một mô hình phân loại để phân loại các ứng dụng vốn vay ngân hàng như là an toàn hoặc nguy hiểm, | 362 Chapter 6 Classification and Prediction cancerous patient is not cancerous is far greater than that of a false positive incorrectly yet conservatively labeling a noncancerous patient as cancerous . In such cases we can outweigh one type of error over another by assigning a different cost to each. These costs may consider the danger to the patient financial costs of resulting therapies and other hospital costs. Similarly the benefits associated with a true positive decision may be different than that of a true negative. Up to now to compute classifier accuracy we have assumed equal costs and essentially divided the sum of true positives and true negatives by the total number of test tuples. Alternatively we can incorporate costs and benefits by instead computing the average cost or benefit per decision. Other applications involving cost-benefit analysis include loan application decisions and target marketing mailouts. For example the cost of loaning to a defaulter greatly exceeds that of the lost business incurred by denying a loan to a nondefaulter. Similarly in an application that tries to identify households that are likely to respond to mailouts of certain promotional material the cost of mailouts to numerous households that do not respond may outweigh the cost of lost business from not mailing to households that would have responded. Other costs to consider in the overall analysis include the costs to collect the data and to develop the classification tool. Are there other cases where accuracy may not be appropriate In classification problems it is commonly assumed that all tuples are uniquely classifiable that is that each training tuple can belong to only one class. Yet owing to the wide diversity of data in large databases it is not always reasonable to assume that all tuples are uniquely classifiable. Rather it is more probable to assume that each tuple may belong to more than one class. How then can the accuracy of classifiers on large databases be .