TAILIEUCHUNG - Báo cáo khoa học: "A Generalized Vector Space Model for Text Retrieval Based on Semantic Relatedness"

Generalized Vector Space Models (GVSM) extend the standard Vector Space Model (VSM) by embedding additional types of information, besides terms, in the representation of documents. An interesting type of information that can be used in such models is semantic information from word thesauri like WordNet. Previous attempts to construct GVSM reported contradicting results. The most challenging problem is to incorporate the semantic information in a theoretically sound and rigorous manner and to modify the standard interpretation of the VSM. In this paper we present a new GVSM model that exploits WordNet’s semantic information. The model is based on a. | A Generalized Vector Space Model for Text Retrieval Based on Semantic Relatedness George Tsatsaronis and Vicky Panagiotopoulou Department of Informatics Athens University of Economics and Business 76 Patision Str. Athens Greece gbt@ vpanagiotopoulou@ Abstract Generalized Vector Space Models GVSM extend the standard Vector Space Model VSM by embedding additional types of information besides terms in the representation of documents. An interesting type of information that can be used in such models is semantic information from word thesauri like WordNet. Previous attempts to construct GVSM reported contradicting results. The most challenging problem is to incorporate the semantic information in a theoretically sound and rigorous manner and to modify the standard interpretation of the VSM. In this paper we present a new GVSM model that exploits WordNet s semantic information. The model is based on a new measure of semantic relatedness between terms. Experimental study conducted in three TREC collections reveals that semantic information can boost text retrieval performance with the use of the proposed GVSM. 1 Introduction The use of semantic information into text retrieval or text classification has been controversial. For example in Mavroeidis et al. 2005 it was shown that a GVSM using WordNet Fellbaum 1998 senses and their hypernyms improves text classification performance especially for small training sets. In contrast Sanderson 1994 reported that even 90 accurate WSD cannot guarantee retrieval improvement though their experimental methodology was based only on randomly generated pseudowords of varying sizes. Similarly Voorhees 1993 reported a drop in retrieval performance when the retrieval model was based on WSD information. On the contrary the construction of a sense-based retrieval model by Stokoe et al. 2003 improved performance while several years before Krovetz and Croft 1992 had already pointed out that resolving word senses can improve .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.