TAILIEUCHUNG - Báo cáo khoa học: "Efﬁciently Accessing Wikipedia’s Edit History"

We present an open-source toolkit which allows (i) to reconstruct past states of Wikipedia, and (ii) to efﬁciently access the edit history of Wikipedia articles. Reconstructing past states of Wikipedia is a prerequisite for reproducing previous experimental work based on Wikipedia. Beyond that, the edit history of Wikipedia articles has been shown to be a valuable knowledge source for NLP, but access is severely impeded by the lack of efﬁcient tools for managing the huge amount of provided data. . | Wikipedia Revision Toolkit Efficiently Accessing Wikipedia s Edit History Oliver Ferschke Torsten Zesch and Iryna Gurevych Ubiquitous Knowledge Processing Lab Computer Science Department Technische Universitat Darmstadt Hochschulstrasse 10 D-64289 Darmstadt Germany http Abstract We present an open-source toolkit which allows i to reconstruct past states of Wikipedia and ii to efficiently access the edit history of Wikipedia articles. Reconstructing past states of Wikipedia is a prerequisite for reproducing previous experimental work based on Wikipedia. Beyond that the edit history of Wikipedia articles has been shown to be a valuable knowledge source for NLP but access is severely impeded by the lack of efficient tools for managing the huge amount of provided data. By using a dedicated storage format our toolkit massively decreases the data volume to less than 2 of the original size and at the same time provides an easy-to-use interface to access the revision data. The language-independent design allows to process any language represented in Wikipedia. We expect this work to consolidate NLP research using Wikipedia in general and to foster research making use of the knowledge encoded in Wikipedia s edit history. 1 Introduction In the last decade the free encyclopedia Wikipedia has become one of the most valuable and comprehensive knowledge sources in Natural Language Processing. It has been used for numerous NLP tasks . word sense disambiguation semantic relatedness measures or text categorization. A detailed survey on usages of Wikipedia in NLP can be found in Medelyan et al. 2009 . The majority of Wikipedia-based NLP algorithms works on single snapshots of Wikipedia which are 97 published by the Wikimedia Foundation as XML dumps at irregular Such a snapshot only represents the state of Wikipedia at a certain fixed point in time while Wikipedia actually is a dynamic resource that is constantly changed by its millions of .

Ngọc Mai 66 6 pdf

Upload

Bấm vào đây để xem trước nội dung

Tải xuống

TÀI LIỆU LIÊN QUAN

Báo cáo khoa học: "Efﬁciently Accessing Wikipedia’s Edit History"

6 59 0

TÀI LIỆU XEM NHIỀU

Một Case Về Hematology (1)

8 461832 55

Giới thiệu :Lập trình mã nguồn mở

14 22459 57

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10842 529

Câu hỏi và đáp án bài tập tình huống Quản trị học

14 10017 445

Phân tích và làm rõ ý kiến sau: “Bài thơ Tự tình II vừa nói lên bi kịch duyên phận vừa cho thấy khát vọng sống, khát vọng hạnh phúc của Hồ Xuân Hương”

3 9471 104

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8233 1124

Tiểu luận: Nội dung tư tưởng Hồ Chí Minh về đạo đức

16 8193 423

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7854 2219

Đề tài: Dự án kinh doanh thời trang quần áo nữ

17 6628 252

Vật lý hạt cơ bản (1)

29 5745 85

TỪ KHÓA LIÊN QUAN

TÀI LIỆU MỚI ĐĂNG

Động cơ đốt trong và máy kéo công nghiêp tập 2 part 8

32 257 0 17-04-2024

Sáng tạo trong thuật toán và lập trình với ngôn ngữ Pascal và C# Tập 2 - Chương 4

47 244 1 17-04-2024

Mass Transfer in Multiphase Systems and its Applications Part 19

40 253 1 17-04-2024

CẤU TẠO HẠT NHÂN NGUYÊN TỬ-ĐỘ HỤT KHỐI-NĂNG LƯỢNG LIÊN KẾT-LK RIÊNG

12 261 0 17-04-2024

Sẵn sàng cho thảm họa

9 217 0 17-04-2024

Magnetic Bearings Theory and Applications phần 2

14 169 0 17-04-2024

TƯƠNG QUAN GIỮA MÔ HỌC, GIẢI PHẪU VÀ HÌNH ẢNH CỦA CÁC KHỐI U PHẦN PHỤ

3 165 0 17-04-2024

Posted prices versus bargaining in markets_7

23 152 0 17-04-2024

MySQL Database Usage & Administration PHẦN 9

37 136 0 17-04-2024

THE ANTHROPOLOGY OF ONLINE COMMUNITIES BY Samuel M.Wilson and Leighton C. Peterson

19 137 0 17-04-2024

TÀI LIỆU HOT

Mẫu đơn thông tin ứng viên ngân hàng VIB

8 7854 2219

Giáo trình Tư tưởng Hồ Chí Minh - Mạch Quang Thắng (Dành cho bậc ĐH - Không chuyên ngành Lý luận chính trị)

152 5548 1314

Ebook Chào con ba mẹ đã sẵn sàng

112 3734 1227

Ebook Facts and Figures – Basic reading practice: Phần 1 – Đặng Tuấn Anh (Dịch)

249 8233 1124

Ebook Tuyển tập đề bài và bài văn nghị luận xã hội: Phần 1

62 5221 1122

Giáo trình Văn hóa kinh doanh - PGS.TS. Dương Thị Liễu

561 3457 641

Tiểu luận: Tư tưởng Hồ Chí Minh về xây dựng nhà nước trong sạch vững mạnh

13 10842 529

Giáo trình Sinh lí học trẻ em: Phần 1 - TS Lê Thanh Vân

122 3662 524

Giáo trình Pháp luật đại cương: Phần 1 - NXB ĐH Sư Phạm

274 4009 513

Bài tập nhóm quản lý dự án: Dự án xây dựng quán cafe

35 4085 477