TAILIEUCHUNG - Báo cáo khoa học: "Text Segmentation Using Reiteration and Collocation"

A method is presented for segmenting text into subtopic areas. The proportion of related pairwise words is calculated between adjacent windows of text to determine their lexical similarity. The lexical cohesion relations of reiteration and collocation are used to identify related words. These relations are automatically located using a combination of three linguistic features: word repetition, collocation and relation weights. This method is shown to successfully detect known subject changes in text and corresponds well to the segmentations placed by test subjects. . | Text Segmentation Using Reiteration and Collocation Amanda c. Jobbins Department of Computing Nottingham Trent University Nottingham NG1 4BU UK ajobbins@ Lindsay J. Evett Department of Computing Nottingham Trent University Nottingham NG1 4BU UK lje@ Abstract A method is presented for segmenting text into subtopic areas. The proportion of related pairwise words is calculated between adjacent windows of text to determine their lexical similarity. The lexical cohesion relations of reiteration and collocation are used to identify related words. These relations are automatically located using a combination of three linguistic features word repetition collocation and relation weights. This method is shown to successfully detect known subject changes in text and corresponds well to the segmentations placed by test subjects. Introduction Many examples of heterogeneous data can be found in daily life. The Wall Street Journal archives for example consist of a series of articles about different subject areas. Segmenting such data into distinct topics is useful for information retrieval where only those segments relevant to a user s query can be retrieved. Text segmentation could also be used as a pre-processing step in automatic summarisation. Each segment could be summarised individually and then combined to provide an abstract for a document. Previous work on text segmentation has used term matching to identify clusters of related text. Salton and Buckley 1992 and later Hearst 1994 extracted related text portions by matching high frequency terms. Yaari 1997 segmented text into a hierarchical structure identifying sub-segments of larger segments. Ponte and Croft 1997 used word co-occurrences to expand the number of terms for matching. Reynar 1994 compared all words across a text rather than the more usual nearest neighbours. A problem with using word repetition is that inappropriate matches can be made because of the lack of contextual information .

Ðình Nguyên 47 5 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Text Segmentation by Language Using Minimum Description Length"

10 46 0

Báo cáo khoa học: "Text Segmentation with LDA-Based Fisher Kernel"

4 56 0

Báo cáo khoa học: "Unsupervised Segmentation of Chinese Text by Use of Branching Entropy"

8 47 0

Báo cáo khoa học: "A Statistical Model for Domain-Independent Text Segmentation"

8 74 0

Báo cáo khoa học: "Text Segmentation Using Reiteration and Collocation"

5 36 0

Báo cáo khoa học: "Text Segmentation with Multiple Surface Linguistic Cues"

5 47 0

Báo cáo khoa học: "Optimal Multi-Paragraph Text Segmentation by Dynamic Programming"

3 53 0

Báo cáo khoa học: "Cohesion and Collocation: Using Context Vectors in Text Segmentation"

5 55 0

Báo cáo khoa học: "BASED TEXT SEGMENTATION ON SIMILARITY BETWEEN WORDS"

3 38 0

Báo cáo khoa học: "MULTI-PARAGRAPH SEGMENTATION EXPOSITORY TEXT"

8 38 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 462370 61

Giới thiệu :Lập trình mã nguồn mở

14 26953 79

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11382 543

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10579 468

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9861 108

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8911 1161

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8531 426

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8111 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 8021 1831

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 7305 268

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

CHƯƠNG 2: RỦI RO THÂM HỤT TÀI KHÓA

28 167 1 14-01-2025

Báo cáo " Thẩm quyền quản lí nhà nước đối với hoạt động quảng cáo thực trạng và hướng hoàn thiện "

7 217 7 14-01-2025

ĐỀ TÀI " ĐÁNH GIÁ HIỆU QUẢ HOẠT ĐỘNG KINH DOANH NGOẠI HỐI CỦA NGÂN HÀNG THƯƠNG MẠI CỔ PHẦN XUẤT NHẬP KHẨU VIỆT NAM "

51 160 3 14-01-2025

Báo cáo nghiên cứu khoa học " Vai trò chính quyền địa phương trong phát triển kinh tế : khu chuyên doanh gốm sứ ( Trung Quốc ) và Bát Tràng ( Việt Nam )("

11 218 1 14-01-2025

Báo cáo nghiên cứu khoa học " Sự nhất quán phát triển kinh tế thị trường XHCN trong xây dựng xã hội hài hoà của Trung Quốc và đổi mới của Việt Nam "

8 153 1 14-01-2025

IT Audit: EMC’s Journey to the Private Cloud

13 163 1 14-01-2025

Xinh xinh vườn nhà

6 135 0 14-01-2025

Phạm trù Chủ nghĩa cá nhân của tư tưởng phương Tây trong sự lý giải của Phan Khôi _1

9 138 0 14-01-2025

ĐỀ LUYỆN THI ĐẠI HỌC MÔN: TIẾNG ANH - SỐ 3

4 137 1 14-01-2025

đề cương ôn tập chương Vật lý 10 - Cơ học

6 135 0 14-01-2025

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 8111 2279

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 8021 1831

Ebook Chào con ba mẹ đã sẵn sàng

112 4453 1378

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 6403 1280

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8911 1161

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3867 680

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3932 610

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4813 568

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 11382 543

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4544 490