TAILIEUCHUNG - Data Preparation for Data Mining- P8

Data Preparation for Data Mining- P8: Ever since the Sumerian and Elam peoples living in the Tigris and Euphrates River basin some 5500 years ago invented data collection using dried mud tablets marked with tax records, people have been trying to understand the meaning of, and get use from, collected data. More directly, they have been trying to determine how to use the information in that data to improve their lives and achieve their objectives. | H M L Total T 6 0 0 6 A 3 8 3 14 S 0 0 6 6 Total 9 8 9 26 Figure Bivariate histogram showing the joint distributions of the categories for weight and height of the Canadiens. Notice that some of the categories overlap each other. It is these overlaps that allow an appropriate ordering for the categories to be discovered. In this example since the meaning of the labels is known the ordering may appear intuitive. However since the labels are arbitrary and applied meaningfully only for ease in the example they can be validly restated. Table shows the same information as in Table but with different labels and reordered. Is it now intuitively easy to see what the ordering should be TABLE Restated cross-tabulation. Please purchase PDF Split-Merge on to remove this watermark. A B C Total X 3 3 8 14 Y 0 6 0 6 Z 6 0 0 6 Total 9 9 8 26 Table contains exactly the same information as Table but has made intuitive ordering difficult or impossible. It is possible to use this information to reconstruct an appropriate ordering albeit not intuitively. For ease of understanding the previous labeling system is used although the actual labels used so long as consistently applied are not important to recovering an ordering. Restating the cross-tabulation of Table in a different form shows how this recovery begins. Table lists the number of players in each of the possible categories. TABLE Category count tabulation. Weight Height Count H T 6 H A 3 H S 0 M T 0 M A 8 M S 0 Please purchase PDF Split-Merge on to remove this watermark. L T 0 LA 3 L S 6 The information in Table represents a sort of jigsaw puzzle. Although in this example the categories in all of the tables are shown appropriately ordered to clarify explanation the real situation is that the ordering is unknown and that needs to be discovered. What is known are the various frequencies for each of the category couplings which are pairings here as

TỪ KHÓA LIÊN QUAN
TÀI LIỆU MỚI ĐĂNG
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.