TAILIEUCHUNG - Data Preparation for Data Mining- P5

Data Preparation for Data Mining- P5: Ever since the Sumerian and Elam peoples living in the Tigris and Euphrates River basin some 5500 years ago invented data collection using dried mud tablets marked with tax records, people have been trying to understand the meaning of, and get use from, collected data. More directly, they have been trying to determine how to use the information in that data to improve their lives and achieve their objectives. | is possible for the output. Usually the level of detail in the input streams needs to be at least one level of aggregation more detailed than the required level of detail in the output. Knowing the granularity available in the data allows the miner to assess the level of inference or prediction that the data could potentially support. It is only potential support because there are many other factors that will influence the quality of a model but granularity is particularly important as it sets a lower bound on what is possible. For instance the marketing manager at FNBA is interested in part in the weekly variance of predicted approvals to actual approvals. To support this level of detail the input stream requires at least daily approval information. With daily approval rates available the miner will also be able to build inferential models when the manager wants to discover the reason for the changing trends. There are cases where the rule of thumb does not hold such as predicting Stock Keeping Units SKU sales based on summaries from higher in the hierarchy chain. However even when these exceptions do occur the level of granularity still needs to be known. Consistency Inconsistent data can defeat any modeling technique until the inconsistency is discovered and corrected. A fundamental problem here is that different things may be represented by the same name in different systems and the same thing may be represented by different names in different systems. One data assay for a major metropolitan utility revealed that almost 90 of the data volume was in fact duplicate. However it was highly inconsistent and rationalization itself took a vast effort. The perspective with which a system of variables mentioned in Chapter 2 is built has a huge effect on what is intended by the labels attached to the data. Each system is built for a specific purpose almost certainly different from the purposes of other systems. Variable content however labeled is defined by the .

Phương Thùy 44 30 pdf

Upload

Không thể tạo bản xem trước, hãy bấm tải xuống

Tải xuống

TÀI LIỆU LIÊN QUAN

Bài giảng Bảo mật cơ sở dữ liệu: Chương 1 - Trần Thị Kim Chi

195 114 4

Bài giảng Bảo mật cơ sở dữ liệu: Chương 3 - Trần Thị Kim Chi

130 114 2

Bài giảng Bảo mật hệ thống thông tin: Chương 7 - ĐH Bách khoa TP HCM

70 114 2

Bảo mật trong SQL

12 119 5

Bài giảng Bảo mật cơ sở dữ liệu: Chương 2 - Trần Thị Kim Chi

177 95 3

Bài giảng Bảo mật cơ sở dữ liệu: Chương 3 - Trần Thị Kim Chi (tt)

59 93 3

Bài giảng Hệ quản trị cơ sở dữ liệu: Chương 4 - ĐH Công nghiệp Thực phẩm

92 166 1

Bài giảng Hệ quản trị cơ sở dữ liệu: Các tác vụ quản trị hệ thống - TS. Lại Hiền Phương (Phần 1)

32 95 1

Bài giảng Bảo mật cơ sở dữ liệu: Discretionary Access Control - Trần Thị Kim Chi

138 99 2

Bài giảng Bảo mật cơ sở dữ liệu: Security models - Trần Thị Kim Chi

141 95 1

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462343 61

Giới thiệu :Lập trình mã nguồn mở

14 26232 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11352 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10553 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9844 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8892 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8508 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8101 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7786 1798

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7279 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Đóng mới oto 8 chỗ ngồi part 9

10 180 3 31-12-2024

Giáo trình phân tích phương trình vi phân viết dưới dạng thuật toán đặc tính của hệ thống p1

5 164 1 31-12-2024

Báo cáo nghiên cứu nông nghiệp " Biofertiliser inoculant technology for the growth of rice in Vietnam: Developing technical infrastructure for quality assurance and village production for farmers "

12 147 2 31-12-2024

Quy Trình Canh Tác Cây Bông Vải

8 165 3 31-12-2024

Báo cáo nghiên cứu khoa học " HÃY LÀM CHO HUẾ XANH HƠN VÀ ĐẸP HƠN "

6 181 3 31-12-2024

Bảng màu theo chữ cái – V

11 168 2 31-12-2024

Chương 10: Các phương pháp tính quá trình quá độ trong mạch điện tuyến tính

57 235 7 31-12-2024

Hướng dẫn chế độ dinh dưỡng cho người bệnh viêm khớp

5 171 2 31-12-2024

ETHICAL CODE HANDBOOK: Demonstrate your commitment to high standards

7 148 1 31-12-2024

ĐỀ TÀI " ĐÁNH GIÁ HIỆU QUẢ HOẠT ĐỘNG KINH DOANH NGOẠI HỐI CỦA NGÂN HÀNG THƯƠNG MẠI CỔ PHẦN XUẤT NHẬP KHẨU VIỆT NAM "

51 153 3 31-12-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8101 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7786 1798

Ebook Chào con ba mẹ đã sẵn sàng

112 4412 1374

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6322 1274

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8892 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3846 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3921 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4724 566

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11352 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4511 490