TAILIEUCHUNG - Báo cáo khoa học: "Examining the Content Load of Part of Speech Blocks for Information Retrieval"

We investigate the connection between part of speech (POS) distribution and content in language. We define POS blocks to be groups of parts of speech. We hypothesise that there exists a directly proportional relation between the frequency of POS blocks and their content salience. We also hypothesise that the class membership of the parts of speech within such blocks reflects the content load of the blocks, on the basis that open class parts of speech are more content-bearing than closed class parts of speech. We test these hypotheses in the context of Information Retrieval, by syntactically representing queries, and. | Examining the Content Load of Part of Speech Blocks for Information Retrieval Christina Lioma Department of Computing Science University of Glasgow 17 Lilybank Gardens Scotland . xristina@ ladh Ounis Department of Computing Science University of Glasgow 17 Lilybank Gardens Scotland . ounis@ Abstract We investigate the connection between part of speech POS distribution and content in language. We define POS blocks to be groups of parts of speech. We hypothesise that there exists a directly proportional relation between the frequency of POS blocks and their content salience. We also hypothesise that the class membership of the parts of speech within such blocks reflects the content load of the blocks on the basis that open class parts of speech are more content-bearing than closed class parts of speech. We test these hypotheses in the context of Information Retrieval by syntactically representing queries and removing from them content-poor blocks in line with the aforementioned hypotheses. For our first hypothesis we induce POS distribution information from a corpus and approximate the probability of occurrence of POS blocks as per two statistical estimators separately. For our second hypothesis we use simple heuristics to estimate the content load within POS blocks. We use the Text REtrieval Conference TREC queries of 1999 and 2000 to retrieve documents from the WT2G and WT10G test collections with five different retrieval strategies. Experimental outcomes confirm that our hypotheses hold in the context of Information Retrieval. 1 Introduction The task of an Information Retrieval IR system is to retrieve documents from a collection in response to a user need which is expressed in the form of a query. Very often this task is realised by indexing the documents in the collection with keyword descriptors. Retrieval consists in matching the query against the descriptors of the documents and returning the ones that appear closest in ranked

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.