TAILIEUCHUNG - Data Mining and Knowledge Discovery Handbook, 2 Edition part 12

Data Mining and Knowledge Discovery Handbook, 2 Edition part 12. Knowledge Discovery demonstrates intelligent computing at its best, and is the most desirable and interesting end-product of Information Technology. To be able to discover and to extract knowledge from data is a task that many researchers and practitioners are endeavoring to accomplish. There is a lot of hidden knowledge waiting to be discovered – this is the challenge created by today’s abundance of data. Data Mining and Knowledge Discovery Handbook, 2nd Edition organizes the most current concepts, theories, standards, methodologies, trends, challenges and applications of data mining (DM) and knowledge discovery. | 90 Barak Chizi and Oded Maimon D Pr C Vi Vi Vj vj Pr C Vj Vj y n cIV v- V- v- l P c Vi Vi Vj Vj 5-2 cYCP cIVi Vi Vj vj log2 p c Vj vj For each feature i the algorithm finds a set Mi containing K attributes from those that remain that is likely to include the information feature i has about the class values. Mi contains K features out of the remaining features for which the value of Equation is smallest- The expected cross entropy between the distribution of the class values given Mi Vi and the distribution of class values given just Mi is calculated for each feature i. The feature for which this quantity is minimal is removed from the set. This process iterates until the user- specified number of features are removed from the original set. Experiments on natural domains and two artificial domains using and naive Bayes as the final induction algorithm showed that the feature selector gives the best results when the size K of the conditioning set M is set to 2. In two domains containing over 1000 features the algorithm is able to reduce the number of features by more than half while improving accuracy by one or two percent. One problem with the algorithm is that it requires features with more than two values to be encoded as binary in order to avoid the bias that entropic measures have toward features with many values. This can greatly increase the number of features in the original data as well as introducing further dependencies. Furthermore the meaning of the original attributes is obscured making the output of algorithms such as hard to interpret. An Instance Based Approach to Feature Selection - RELIEF Kira and Rendell 1992 describe an algorithm called RELIEF that uses instance based learning to assign a relevance weight to each feature. Each feature s weight reflects its ability to distinguish among the class values. Features are ranked by weight and those that exceed a user- specified threshold are selected to form the final subset. The algorithm

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.