Đang chuẩn bị liên kết để tải về tài liệu:
Báo cáo khoa học: "Extracting Sequences from the Web"

Đang chuẩn bị nút TẢI XUỐNG, xin hãy chờ

Classical Information Extraction (IE) systems fill slots in domain-specific frames. This paper reports on S EQ, a novel open IE system that leverages a domainindependent frame to extract ordered sequences such as presidents of the United States or the most common causes of death in the U.S. S EQ leverages regularities about sequences to extract a coherent set of sequences from Web text. S EQ nearly doubles the area under the precision-recall curve compared to an extractor that does not exploit these regularities. . | Extracting Sequences from the Web Anthony Fader Stephen Soderland and Oren Etzioni University of Washington Seattle afader soderlan etzioni @cs.washington.edu Abstract Classical Information Extraction IE systems fill slots in domain-specific frames. This paper reports on Seq a novel open IE system that leverages a domainindependent frame to extract ordered sequences such as presidents of the United States or the most common causes of death in the U.S. Seq leverages regularities about sequences to extract a coherent set of sequences from Web text. Seq nearly doubles the area under the precision-recall curve compared to an extractor that does not exploit these regularities. 1 Introduction Classical IE systems fill slots in domain-specific frames such as the time and location slots in seminar announcements Freitag 2000 or the terrorist organization slot in news stories Chieu et al. 2003 . In contrast open IE systems are domainindependent but extract flat sets of assertions that are not organized into frames and slots Sekine 2006 Banko et al. 2007 . This paper reports on Seq an open IE system that leverages a domain-independent frame to extract ordered sequences of objects from Web text. We show that the novel domain-independent sequence frame in Seq substantially boosts the precision and recall of the system and yields coherent sequences filtered from low-precision extractions Table 1 . Sequence extraction is distinct from set expansion Etzioni et al. 2004 Wang and Cohen 2007 because sequences are ordered and because the extraction process does not require seeds or HTML lists as input. The domain-independent sequence frame consists of a sequence name s e.g. presidents of the United States and a set of ordered pairs x k where x is a string naming a member of the sequence with name s and k is an integer indicating Most common cause of death in the United States 1. heart disease 2. cancer 3. stroke 4. COPD 5. pneumonia 6. cirrhosis 7. AIDS 8. chronic liver disease 9. .

TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.