TAILIEUCHUNG - Báo cáo khoa học: "A Novel Approach to Semantic Indexing Based on Concept"

This paper suggests the efficient indexing method based on a concept vector space that is capable of representing the semantic content of a document. The two information measure, namely the information quantity and the information ratio, are defined to represent the degree of the semantic importance within a document. The proposed method is expected to compensate the limitations of term frequency based methods by exploiting related lexical items. Furthermore, with information ratio, this approach is independent of document length. . | A Novel Approach to Semantic Indexing Based on Concept Bo-Yeong Kang Department of Computer Engineering Kyungpook National University 1370 Sangyukdong Pukgu Daegu Korea ROK comeng99@ Abstract This paper suggests the efficient indexing method based on a concept vector space that is capable of representing the semantic content of a document. The two information measure namely the information quantity and the information ratio are defined to represent the degree of the semantic importance within a document. The proposed method is expected to compensate the limitations of term frequency based methods by exploiting related lexical items. Furthermore with information ratio this approach is independent of document length. 1 Introduction To improve the unstable performance of a traditional keyword-based search a Web document should include both an index and index weight that represent the semantic content of the document. However most of the previous works on indexing and the weighting function which depend on statistical methods have limitations in extracting exact indexes Moens 2000 . The objective of this paper is to propose a method that extracts indexes efficiently and weights them according to their semantic importance degree in a document using concept vector space model. A document is regarded as a conglomerate concept that comprises by many concepts. Hence an n-dimensional concept vector space model is defined in such a way that a document is recognized as a vector in n-dimensional concept space. We used lexical chains for the extraction of concepts. With concept vectors and text vectors semantic indexes and their semantic importance degree are computed. Furthermore proposed indexing method had an advantage in being independent of document length because we regarded overall text information as a value 1 and represented each index weight by the semantic information ratio of overall text information. 2 Related Works Since index terms are not equally .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.