TAILIEUCHUNG - Survey on structure and structure content classification of XML document

In this work, our objective is to give a survey on the classification of XML documents. As XML documents are basically text documents containing the content and structure information, they can be classified based on i) structure only and ii) a combination of both structure and content. This paper gives a brief survey based on this classification. | ISSN:2249-5789 Thasleena N T , International Journal of Computer Science & Communication Networks,Vol 4(1),22-26 Survey on Structure and Structure-content Classification of XML Document Thasleena N T Varghese S C Department of Computer Science Rajagiri School of Engineering & Technology, Kochi thalu555@ Department of Computer Science Rajagiri School of Engineering & Technology, Kochi varghesesc@ Abstract—In recent years, XML has become a popular way of storing many data sets because of its semi-structured nature. It allows modeling of a wide variety of databases as XML documents. XML data thus form a significant part in data mining domain, and it is valuable to develop classification methods for such data. Due to increase in XML documents, researchers are now focusing on applying the typical text mining tasks such as text classification, text clustering and other related tasks on XML corpus. In this work, our objective is to give a survey on the classification of XML documents. As XML documents are basically text documents containing the content and structure information, they can be classified based on i) structure only and ii) a combination of both structure and content. This paper gives a brief survey based on this classification Keywords:XML,classification,clustering,ontology,feature extraction,data mining,frequent pattern,XSLT,WordNet. I. I NTRODUCTION XML is one of the popular structure for data representation that allows organizing textual content into logical structures. In case of traditional information retrieval systems that deal with only flat documents but XML retrieval systems must also take the structure of documents along with its textual contents. Every XML document includes both logical and physical structures. So based on these two information XML can be classified based on two approaches. One approach uses only the structural information of XML data in classification. Another one performs the classification, by .

Mộng Tuyền 110 5 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Distributed security system for mobile ad hoc computer networks

8 117 0

Interactive multicast framework between wireless mesh networks and internet

5 92 0

Science and Communication Networks, Regeneration of ZVS converter with resonant inductor

8 108 0

Multilayer coding mechanisms for broadcasting over mimo networks

13 85 0

A robusat and efficient data transmission in adhoc networks

5 93 0

Self directed reconfigurable ADHOC networks

7 75 0

QOS aware location anonymization mechanism for wireless sensor networks

5 72 0

Challenges and issues in wireless sensor networks based intelligent transportation system

6 111 0

A study of various energy efficeint protocols for wireless sensor networks

12 62 0

Energy occupancy in wireless sensor networks

6 74 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462336 61

Giới thiệu :Lập trình mã nguồn mở

14 25915 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11335 542

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10543 466

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9835 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8885 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8499 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8098 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7709 1788

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7240 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Giáo án mầm non chương trình đổi mới: Gia đình vui nhộn

4 391 3 23-12-2024

B2B Content Marketing: 2012 Benchmarks, Budgets & Trends

17 228 3 23-12-2024

báo cáo hóa học:" Increased androgen receptor expression in serous carcinoma of the ovary is associated with an improved survival"

6 156 3 23-12-2024

CHƯƠNG 2: RỦI RO THÂM HỤT TÀI KHÓA

28 158 1 23-12-2024

Bệnh sán lá gan trên gia súc và cách phòng trị

3 162 1 23-12-2024

Word Games with English 1

65 137 1 23-12-2024

Báo cáo nghiên cứu khoa học " NÂNG QUAN HỆ KINH TẾ THƯƠNG MẠI VIỆT NAM - TRUNG QUỐC LÊN TẦM CAO THỜI ĐẠI "

8 170 1 23-12-2024

Chủ đề 3 : SỰ CÂN BẰNG CỦA VẬT RẮN (4 tiết)

9 206 1 23-12-2024

Xinh xinh vườn nhà

6 131 0 23-12-2024

Determini prounoun 1

6 139 0 23-12-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8098 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 7709 1788

Ebook Chào con ba mẹ đã sẵn sàng

112 4406 1371

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6273 1266

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8885 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3835 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3917 609

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4700 565

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11335 542

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4501 490