TAILIEUCHUNG - Finding Surprising Patterns in a Time Series Database in Linear Time and Space

Given the beneﬁts that a database system provides for structuring data and preserving its durability and integrity, one might expect to ﬁnd scientists and engineers making ex- tensive use of database systems to manage their data. Un- fortunately, domains such as biology, chemistry, mechanical engineering (and a variety of others) typically use databases in only the most rudimentary of ways, running few or no queries and storing only raw observations as they are cap- tured from sensors or other ﬁeld instruments. This is be- cause the real-world data acquired using such measurement infrastructures is typically incomplete, imprecise, and erro- neous, and hence rarely usable as it is. The raw data needs to. | Finding Surprising Patterns in a Time Series Database in Linear Time and Space Eamonn Keogh Stefano Lonardi Bill Yuan-chi Chiu Department of Computer Science and Engineering University of California Riverside CA 92521 ABSTRACT The problem of finding a specified pattern in a time series database . query by content has received much attention and is now a relatively mature field. In contrast the important problem of enumerating all surprising or interesting patterns has received far less attention. This problem requires a meaningful definition of surprise and an efficient search technique. All previous attempts at finding surprising patterns in time series use a very limited notion of surprise and or do not scale to massive datasets. To overcome these limitations we introduce a novel technique that defines a pattern surprising if the frequency of its occurrence differs substantially from that expected by chance given some previously seen data. This notion has the advantage of not requiring an explicit definition of surprise which may be impossible to elicit from a domain expert. Instead the user simply gives the algorithm a collection of previously observed normal data. Our algorithm uses a suffix tree to efficiently encode the frequency of all observed patterns and allows a Markov model to predict the expected frequency of previously unobserved patterns. Once the suffix tree has been constructed a measure of surprise for all the patterns in a new database can be determined in time and space linear in the size of the database. We demonstrate the utility of our approach with an extensive experimental evaluation. Categories and Subject Descriptors Database Management Database Applications Data Mining Keywords__ Time series Suffix Tree Novelty Detection Anomaly Detection Markov Model Feature Extraction. Permission to make digital or hard copies of all or part of this work for personal or classroom use is

Băng Tâm 58 11 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

GLOBAL CONCERNS: IMPLICATIONS FOR THE FUTURE

52 68 0

Lean Accounting Summit: What's It All About?

10 63 0

Analysis Of Korean Real Estate Investment Trusts And Share Price Determinants

1 58 0

OBSERVATIONS FROM 2010 INSPECTIONS OF DOMESTIC ANNUALLY INSPECTED FIRMS REGARDING DEFICIENCIES IN AUDITS OF INTERNAL CONTROL OVER FINANCIAL REPORTING

31 57 0

The Credit Rating Crisis∗Harvard University and NBER

57 56 0

What is the neutral real interest rate, and how can we use it?

1 54 0

Dividends, Earnings, and Stock Prices

8 50 0

ADVANCING THE HEALTH, SAFETY, AND WELL-BEING OF OUR PEOPLE

114 68 0

TECHNIQUES FOR THE ANALYSIS OF ORGANIC CHEMECALS BY INDUCTIVELY COUPLED PLASMA MASS SPECTROMETRY (ICP-MS)

6 57 0

Events Classification in Log Audit

16 51 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462079 59

Giới thiệu :Lập trình mã nguồn mở

14 23858 75

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11130 537

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10375 458

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9656 106

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8662 1151

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8363 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7947 2250

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6991 260

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6789 1613

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Động cơ đốt trong và máy kéo công nghiêp tập 1 part 7

23 288 0 01-07-2024

Báo cáo nghiên cứu khoa học " KẾT QUẢ NGHIÊN CỨU BƯỚC ĐẦU VỀ THIÊN ĐỊCH CHÂN KHỚP TRÊN CÂY THANH TRÀ Ở THỪA THIÊN HUẾ "

7 215 2 01-07-2024

Bơm máy nén quạt trong công nghiệp part 8

20 232 3 01-07-2024

Đóng mới oto 8 chỗ ngồi part 9

10 147 1 01-07-2024

HƯỚNG DẪN SỬ DỤNG PHẦN MỀM CAITA part 9

18 160 0 01-07-2024

Bảng màu theo chữ cái – V

11 121 1 01-07-2024

Chương 10: Các phương pháp tính quá trình quá độ trong mạch điện tuyến tính

57 203 5 01-07-2024

The Constituents of Medicinal Plants

185 140 0 01-07-2024

ĐỀ THI THỬ ĐH NĂM 2011 MÔN VẬT LÍ _ ĐỀ SỐ 101

7 122 0 01-07-2024

Tổng hợp Đề thi học sinh giỏi môn Sinh lớp 9 cấp huyện vòng 1 năm 2010-2011

12 203 2 01-07-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7947 2250

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 6789 1613

Ebook Chào con ba mẹ đã sẵn sàng

112 4025 1302

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5724 1196

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8662 1151

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3655 667

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3849 601

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4420 548

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11130 537

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4306 483