TAILIEUCHUNG - Báo cáo khoa học: "Untangling the Cross-Lingual Link Structure of Wikipedia"

Wikipedia articles in different languages are connected by interwiki links that are increasingly being recognized as a valuable source of cross-lingual information. Unfortunately, large numbers of links are imprecise or simply wrong. In this paper, techniques to detect such problems are identified. We formalize their removal as an optimization task based on graph repair operations. | Untangling the Cross-Lingual Link Structure of Wikipedia Gerard de Melo Max Planck Institute for Informatics Saarbriicken Germany demelo@ Gerhard Weikum Max Planck Institute for Informatics Saarbrucken Germany weikum@ Abstract Wikipedia articles in different languages are connected by interwiki links that are increasingly being recognized as a valuable source of cross-lingual information. Unfortunately large numbers of links are imprecise or simply wrong. In this paper techniques to detect such problems are identified. We formalize their removal as an optimization task based on graph repair operations. We then present an algorithm with provable properties that uses linear programming and a region growing technique to tackle this challenge. This allows us to transform Wikipedia into a much more consistent multilingual register of the world s entities and concepts. 1 Introduction Motivation. The open community-maintained encyclopedia Wikipedia has not only turned the Internet into a more useful and linguistically diverse source of information but is also increasingly being used in computational applications as a large-scale source of linguistic and encyclopedic knowledge. To allow cross-lingual navigation Wikipedia offers cross-lingual interwiki links that for instance connect the Indonesian article about Albert Einstein to the corresponding articles in over 100 other languages. Such links are extraordinarily valuable for cross-lingual applications. In the ideal case a set of articles connected directly or indirectly via such links would all describe the same entity or concept. Due to conceptual drift different granularities as well as mistakes made by editors we frequently find concepts as different as economics and manager in the same connected component. Filtering out inaccurate links enables us to exploit Wikipedia s multilinguality in a much safer manner and allows us to create a multilingual register of named entities. Contribution.

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.