TAILIEUCHUNG - Survey on structure and structure content classification of XML document
In this work, our objective is to give a survey on the classification of XML documents. As XML documents are basically text documents containing the content and structure information, they can be classified based on i) structure only and ii) a combination of both structure and content. This paper gives a brief survey based on this classification. | ISSN:2249-5789 Thasleena N T , International Journal of Computer Science & Communication Networks,Vol 4(1),22-26 Survey on Structure and Structure-content Classification of XML Document Thasleena N T Varghese S C Department of Computer Science Rajagiri School of Engineering & Technology, Kochi thalu555@ Department of Computer Science Rajagiri School of Engineering & Technology, Kochi varghesesc@ Abstract—In recent years, XML has become a popular way of storing many data sets because of its semi-structured nature. It allows modeling of a wide variety of databases as XML documents. XML data thus form a significant part in data mining domain, and it is valuable to develop classification methods for such data. Due to increase in XML documents, researchers are now focusing on applying the typical text mining tasks such as text classification, text clustering and other related tasks on XML corpus. In this work, our objective is to give a survey on the classification of XML documents. As XML documents are basically text documents containing the content and structure information, they can be classified based on i) structure only and ii) a combination of both structure and content. This paper gives a brief survey based on this classification Keywords:XML,classification,clustering,ontology,feature extraction,data mining,frequent pattern,XSLT,WordNet. I. I NTRODUCTION XML is one of the popular structure for data representation that allows organizing textual content into logical structures. In case of traditional information retrieval systems that deal with only flat documents but XML retrieval systems must also take the structure of documents along with its textual contents. Every XML document includes both logical and physical structures. So based on these two information XML can be classified based on two approaches. One approach uses only the structural information of XML data in classification. Another one performs the classification, by .
đang nạp các trang xem trước