TAILIEUCHUNG - Báo cáo khoa học: "Weakly-Supervised Acquisition of Open-Domain Classes and Class Attributes from Web Documents and Query Logs"

A new approach to large-scale information extraction exploits both Web documents and query logs to acquire thousands of opendomain classes of instances, along with relevant sets of open-domain class attributes at precision levels previously obtained only on small-scale, manually-assembled classes. a m | Weakly-Supervised Acquisition of Open-Domain Classes and Class Attributes from Web Documents and Query Logs Marius Pa ca Google Inc. Mountain View California 94043 mars@ Benjamin Van Durme University of Rochester Rochester New York 14627 vandurme@ Abstract A new approach to large-scale information extraction exploits both Web documents and query logs to acquire thousands of opendomain classes of instances along with relevant sets of open-domain class attributes at precision levels previously obtained only on small-scale manually-assembled classes. 1 Introduction Current methods for large-scale information extraction take advantage of unstructured text available from either Web documents Banko et al. 2007 Snow et al. 2006 or more recently logs of Web search queries Paặca 2007 to acquire useful knowledge with minimal supervision. Given a manually-specified target attribute . birth years for people and starting from as few as 10 seed facts such as . John Lennon 1941 as many as a million facts of the same type can be derived from unstructured text within Web documents Pasca et al. 2006 . Similarly given a manually-specified target class . Drug with its instances . Vicodin and Xanax and starting from as few as 5 seed attributes . side effects and maximum dose for Drug other relevant attributes can be extracted for the same class from query logs Pa ca 2007 . These and other previous methods require the manual specification of the input classes of instances before any knowledge . facts or attributes can be acquired for those classes. Contributions made during an internship at Google. The extraction method introduced in this paper mines a collection of Web search queries and a collection of Web documents to acquire open-domain classes in the form of instance sets . whales seals dolphins sea lions . associated with class labels . marine animals as well as large sets of open-domain attributes for each class . circulatory .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.