TAILIEUCHUNG - Báo cáo khoa học: "Extracting Paraphrases from Definition Sentences on the Web"

We propose an automatic method of extracting paraphrases from definition sentences, which are also automatically acquired from the Web. We observe that a huge number of concepts are defined in Web documents, and that the sentences that define the same concept tend to convey mostly the same information using different expressions and thus contain many paraphrases. We show that a large number of paraphrases can be automatically extracted with high precision by regarding the sentences that define the same concept as parallel corpora. . | Extracting Paraphrases from Definition Sentences on the Web Chikara Hashimoto Kentaro Torisawa Stijn De Saeger Jun ichi Kazama Sadao Kurohashi t National Institute of Information and Communications Technology Kyoto 619-0237 JAPAN Graduate School of Informatics Kyoto University Kyoto 606-8501 JAPAN ch t torisawa stijn kazama @ kuro@ Abstract We propose an automatic method of extracting paraphrases from definition sentences which are also automatically acquired from the Web. We observe that a huge number of concepts are defined in Web documents and that the sentences that define the same concept tend to convey mostly the same information using different expressions and thus contain many paraphrases. We show that a large number of paraphrases can be automatically extracted with high precision by regarding the sentences that define the same concept as parallel corpora. Experimental results indicated that with our method it was possible to extract about 300 000 paraphrases from 6 X 108 Web documents with a precision rate of about 94 . 1 Introduction Natural language allows us to express the same information in many ways which makes natural language processing NLP a challenging area. Accordingly many researchers have recognized that automatic paraphrasing is an indispensable component of intelligent NLP systems Iordanskaja et al. 1991 McKeown et al. 2002 Lin and Pantel 2001 Ravichandran and Hovy 2002 Kauchak and Barzi-lay 2006 Callison-Burch et al. 2006 and have tried to acquire a large amount of paraphrase knowledge which is a key to achieving robust automatic paraphrasing from corpora Lin and Pantel 2001 Barzi-lay and McKeown 2001 Shinyama et al. 2002 Barzilay and Lee 2003 . We propose a method to extract phrasal paraphrases from pairs of sentences that define the same 1087 concept. The method is based on our observation that two sentences defining the same concept can be regarded as a parallel corpus since they largely convey the same .

TỪ KHÓA LIÊN QUAN
TAILIEUCHUNG - Chia sẻ tài liệu không giới hạn
Địa chỉ : 444 Hoang Hoa Tham, Hanoi, Viet Nam
Website : tailieuchung.com
Email : tailieuchung20@gmail.com
Tailieuchung.com là thư viện tài liệu trực tuyến, nơi chia sẽ trao đổi hàng triệu tài liệu như luận văn đồ án, sách, giáo trình, đề thi.
Chúng tôi không chịu trách nhiệm liên quan đến các vấn đề bản quyền nội dung tài liệu được thành viên tự nguyện đăng tải lên, nếu phát hiện thấy tài liệu xấu hoặc tài liệu có bản quyền xin hãy email cho chúng tôi.
Đã phát hiện trình chặn quảng cáo AdBlock
Trang web này phụ thuộc vào doanh thu từ số lần hiển thị quảng cáo để tồn tại. Vui lòng tắt trình chặn quảng cáo của bạn hoặc tạm dừng tính năng chặn quảng cáo cho trang web này.