TAILIEUCHUNG - Báo cáo khoa học: "A Statistical Model for Domain-Independent Text Segmentation"

We propose a statistical method that ﬁnds the maximum-probability segmentation of a given text. This method does not require training data because it estimates probabilities from the given text. Therefore, it can be applied to any text in any domain. An experiment showed that the method is more accurate than or at least as accurate as a state-of-the-art text segmentation system. | A Statistical Model for Domain-Independent Text Segmentation Masao Utiyama and Hitoshi Isahara Communications Research Laboratory 2-2-2 Hikaridai Seika-cho Soraku-gun Kyoto 619-0289 Japan mutiyama@ and isahara@ Abstract We propose a statistical method that finds the maximum-probability segmentation of a given text. This method does not require training data because it estimates probabilities from the given text. Therefore it can be applied to any text in any domain. An experiment showed that the method is more accurate than or at least as accurate as a state-of-the-art text segmentation system. 1 Introduction Documents usually include various topics. Identifying and isolating topics by dividing documents which is called text segmentation is important for many natural language processing tasks including information retrieval Hearst and Plaunt 1993 Salton et al. 1996 and summarization Kan et al. 1998 Nakao 2000 . In information retrieval users are often interested in particular topics parts of retrieved documents instead of the documents themselves. To meet such needs documents should be segmented into coherent topics. Summarization is often used for a long document that includes multiple topics. A summary of such a document can be composed of summaries of the component topics. Identification of topics is the task of text segmentation. A lot of research has been done on text segmentation Kozima 1993 Hearst 1994 Oku-mura and Honda 1994 Salton et al. 1996 Yaari 1997 Kan et al. 1998 Choi 2000 Nakao 2000 . A major characteristic of the methods used in this research is that they do not require training data to segment given texts. Hearst 1994 for example used only the similarity of word distributions in a given text to segment the text. Consequently these methods can be applied to any text in any domain even if training data do not exist. This property is important when text segmentation is applied to information retrieval or summarization because both .

Gia Minh 99 8 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "A Discriminative Latent Variable Model for Statistical Machine Translation"

9 64 0

Rice yield prediction for cauvery delta zone of Tamil Nadu using weather based statistical model

7 46 1

Báo cáo khoa học: "Perplexity Minimization for Translation Model Domain Adaptation in Statistical Machine Translation"

11 72 0

Báo cáo khoa học: "Improving Grammaticality in Statistical Sentence Generation: Introducing a Dependency Spanning Tree Algorithm with an Argument Satisfaction Model"

9 71 0

Báo cáo khoa học: "Combining a Statistical Language Model with Logistic Regression to Predict the Lexical and Syntactic Difﬁculty of Texts for FFL"

9 72 0

Báo cáo khoa học: "Adaptation of Statistical Machine Translation Model for Cross-Lingual Information Retrieval in a Service Context"

11 55 0

Statistical model for forecasting uranium prices to estimate the nuclear fuel cycle cost

8 100 0

Statistical model for forecasting area, production and productivity of sesame crop (Sesamum indicum L.) in Andhra Pradesh, India

11 70 1

Báo cáo khoa học: "Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation"

5 51 0

Báo cáo khoa học: "Translation Model Size Reduction for Hierarchical Phrase-based Statistical Machine Translation"

5 61 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462370 61

Giới thiệu :Lập trình mã nguồn mở

14 26953 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11382 543

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10579 468

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9861 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8911 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8531 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8111 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 8021 1831

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7305 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Báo cáo nghiên cứu khoa học " KẾT QUẢ NGHIÊN CỨU BƯỚC ĐẦU VỀ THIÊN ĐỊCH CHÂN KHỚP TRÊN CÂY THANH TRÀ Ở THỪA THIÊN HUẾ "

7 289 4 14-01-2025

B2B Content Marketing: 2012 Benchmarks, Budgets & Trends

17 242 3 14-01-2025

Data Structures and Algorithms - Chapter 8: Heaps

41 195 5 14-01-2025

BÀI GIẢNG Biến Đổi Năng Lượng Điện Cơ - TS. Hồ Phạm Huy

137 167 1 14-01-2025

báo cáo hóa học:" Quality of data collection in a large HIV observational clinic database in sub-Saharan Africa: implications for clinical research and audit of care"

7 164 4 14-01-2025

Sử dụng mô hình ARCH và GARCH để phân tích và dự báo về giá cổ phiếu trên thị trường chứng khoán

24 1080 2 14-01-2025

Đề tài " Dự báo về tác động của Tổ chức Thương mại Thế giới WTO đối với các doanh nghiệp xuất khẩu vừa và nhỏ Việt Nam – Những giải pháp đề xuất "

72 194 2 14-01-2025

Báo cáo " Bàn về hành vi pháp luật và hành vi đạo đức "

11 182 2 14-01-2025

Báo cáo nghiên cứu khoa học " Đại hội XVI thông qua điều lệ Đảng cộng sản Trung Quốc những sửa đổi bổ sung mới "

4 171 1 14-01-2025

OPEN SOURCE ERP REASONABLE TOOLS FOR MANUFACTURING SMEs?

1 156 1 14-01-2025

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8111 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 8021 1831

Ebook Chào con ba mẹ đã sẵn sàng

112 4453 1378

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6403 1280

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8911 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3867 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3932 610

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4813 568

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11382 543

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4544 490