Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ
Tải xuống
Preprocessing Data - Trong thế giới thực của các ứng dụng khai thác dữ liệu, nỗ lực nhiều hơn là chi tiêu chuẩn bị dữ liệu hơn so với áp dụng một chương trình dự báo dữ liệu. Phương pháp khai thác dữ liệu là hoàn toàn có khả năng tìm kiếm các mẫu có giá trị trong dữ liệu. Nó là đơn giản để áp dụng một phương pháp để các dữ liệu và sau đó đánh giá giá trị của kết quả của nó dựa trên việc thực hiện dự đoán ước tính. Điều này không làm. | Chapter 2 Preprocessing Data In the real world of data-mining applications more effort is expended preparing data than applying a prediction program to data. Data mining methods are quite capable of finding valuable patterns in data. It is straightforward to apply a method to data and then judge the value of its results based on the estimated predictive performance. This does not diminish the role of careful attention to data preparation. While the prediction methods may have very strong theoretical capabilities in practice all these methods may be limited by a shortage of data relative to the unlimited space of possibilities that they may search. 2.1 Data Quality To a large extent the design and organization of data including the setting of goals and the composition of features is done by humans. There are two central goals for the preparation of data To organize data into a standard form that is ready for processing by data mining programs. To prepare features that lead to the best predictive performance. It s easy to specify a standard form that is compatible with most prediction methods. It s much harder to generalize concepts for composing the most predictive features. A Standard Form. A standard form helps to understand the advantages and limitations of different prediction techniques and how they reason with data. The standard form model of data constrains our world s view. To find the best set of features it is important to examine the types of features that fit this model of data so that they may be manipulated to increase predictive performance. Most prediction methods require that data be in a standard form with standard types of measurements. The features must be encoded in a numerical format such as binary true-or-false features numerical features or possibly numeric codes. In addition for classification a clear goal must be specified. Prediction methods may differ greatly but they share a common perspective. Their view of the world is cases organized