TAILIEUCHUNG - Báo cáo khoa học: "A Comparison of Document, Sentence, and Term Event Spaces"

The trend in information retrieval systems is from document to sub-document retrieval, such as sentences in a summarization system and words or phrases in question-answering system. Despite this trend, systems continue to model language at a document level using the inverse document frequency (IDF). In this paper, we compare and contrast IDF with inverse sentence frequency (ISF) and inverse term frequency (ITF). A direct comparison reveals that all language models are highly correlated; however, the average ISF and ITF values are and higher than IDF. . | A Comparison of Document Sentence and Term Event Spaces Catherine Blake School of Information and Library Science University of North Carolina at Chapel Hill North Carolina NC 27599-3360 cablake@ Abstract The trend in information retrieval systems is from document to sub-document retrieval such as sentences in a summarization system and words or phrases in question-answering system. Despite this trend systems continue to model language at a document level using the inverse document frequency IDF . In this paper we compare and contrast IDF with inverse sentence frequency ISF and inverse term frequency ITF . A direct comparison reveals that all language models are highly correlated however the average ISF and ITF values are and higher than IDF. All language models appeared to follow a power law distribution with a slope coefficient of for documents and for sentences and terms. We conclude with an analysis of IDF stability with respect to random journal and section partitions of the 100 830 full-text scientific articles in our experimental corpus. 1 Introduction The vector based information retrieval model identifies relevant documents by comparing query terms with terms from a document corpus. The most common corpus weighting scheme is the term frequency TF x inverse document frequency IDF where TF is the number of times a term appears in a document and IDF reflects the distribution of terms within the corpus Salton and Buckley 1988 . Ideally the system should assign the highest weights to terms with the most discriminative power. One component of the corpus weight is the language model used. The most common language model is the Inverse Document Frequency IDF which considers the distribution of terms between documents see equation 1 . IDF has played a central role in retrieval systems since it was first introduced more than thirty years ago Sparck Jones 1972 . IDF ti log2 N -log2 ni 1 1 N is the total number of corpus documents ni is .

Phương Dung 71 8 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Estimating priorities from relative deviations in pairwise comparison matrices

18 51 3

Comparison of proliferation resistance among natural uranium, thoriumeuranium, and thoriumeplutonium fuels used in CANada Deuterium Uranium in deep geological repository by combining multiattribute utility analysis with transport model

7 78 0

Introduction to the optical communications by simulating an optical high debit transmission chain using optisystem with comparison of optical windows

10 68 0

Axitinib, cabozantinib, or everolimus in the treatment of prior sunitinib-treated patients with metastatic renal cell carcinoma: Results of matching-adjusted indirect comparison analyses

12 61 1

Social comparison of tribal groups based on Wadi project adoption

10 52 1

Exact p-values for pairwise comparison of Friedman rank sums, with application to comparing classifiers

18 51 1

Review study on design comparison of solar dryer cum solar cooker

6 40 2

Comparative efficacy of some new insecticides against termites (Odontotermes obesus Rambur) on wheat (Triticum aestivum L.) in comparison to yield under field conditions

7 40 1

uranium

9 75 0

Báo cáo toán học: "Two new criteria for comparison in the Bruhat order"

4 60 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462386 61

Giới thiệu :Lập trình mã nguồn mở

14 27275 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11388 543

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10588 468

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9870 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8914 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8538 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8114 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 8076 1836

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7322 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Báo cáo nghiên cứu khoa học " KẾT QUẢ NGHIÊN CỨU BƯỚC ĐẦU VỀ THIÊN ĐỊCH CHÂN KHỚP TRÊN CÂY THANH TRÀ Ở THỪA THIÊN HUẾ "

7 289 4 23-01-2025

B2B Content Marketing: 2012 Benchmarks, Budgets & Trends

17 243 3 23-01-2025

Data Structures and Algorithms - Chapter 8: Heaps

41 196 5 23-01-2025

Báo cáo nghiên cứu nông nghiệp " Biofertiliser inoculant technology for the growth of rice in Vietnam: Developing technical infrastructure for quality assurance and village production for farmers "

12 153 2 23-01-2025

Báo cáo nghiên cứu nông nghiệp " Field control of pest fruit flies in Vietnam "

14 196 4 23-01-2025

BÀI GIẢNG Biến Đổi Năng Lượng Điện Cơ - TS. Hồ Phạm Huy

137 167 1 23-01-2025

Báo cáo " Thẩm quyền quản lí nhà nước đối với hoạt động quảng cáo thực trạng và hướng hoàn thiện "

7 217 7 23-01-2025

Valve Selection Handbook - Fourth Edition

337 151 2 23-01-2025

ETHICAL CODE HANDBOOK: Demonstrate your commitment to high standards

7 156 1 23-01-2025

Báo cáo nghiên cứu khoa học " NÂNG QUAN HỆ KINH TẾ THƯƠNG MẠI VIỆT NAM - TRUNG QUỐC LÊN TẦM CAO THỜI ĐẠI "

8 179 1 23-01-2025

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8114 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 8076 1836

Ebook Chào con ba mẹ đã sẵn sàng

112 4475 1381

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6463 1285

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8914 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3883 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3934 613

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4833 568

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11388 543

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4551 490